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Overview of linguistic annotation 


Linguistic examples quoted in the chapters are given interlinear glosses and 
English translations. The glossing conventions followed here are laid out in the 
following sections. 


1 Glossing of Old Irish examples 


Nouns are glossed with their translational equivalent and followed by the case 
(Nom Acc» cen» pat) in subscript small capitals. Singular number is viewed here as 
default and is not glossed. Plural nouns are glossed with the tag p, added after 
the case abbreviation following a full stop (e.g. లయల 


(1) feraib 
menpar.pr 

(2 Ggeinti 
gentilesyo, p. 


Adjectives are glossed with their translational equivalent and followed by case, 
number (ss, p), and gender (usc, rems neur) in subscript small capitals, each tag 
separated by a full stop. 


(3  móir 


bigacc.sc.rem 


The definite article and other prenominal modifiers (such as quantifiers) are, 
generally speaking, glossed in the same way as an adjective. However, when 
the definite article is found immediately before a stressed demonstrative, no 
gender features are tagged since the demonstrative itself lacks clearly discern- 
ible gender features. 


(4) a. in fer 
theyomse.masc MaNyom 

b. in só 
thesow.sc thisyom 
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The unstressed demonstrative particles, -sin distal (‘that’) and -so proximal 
(‘this’) are glossed respectively as DIST and PROX. These tags are attached to 
the preceding item with the equals sign. Stressed demonstratives are tagged as 
nouns, as in (4b) above. 


(5 a. in fer-sin 
theyom.sc.masc M€@Nyow=DIST 
b. in fer-so 


theyom.sc.masc MaANyom=PROX 


The stressed anaphoric pronoun, suide (in all case forms) is glossed with the tag 
ANAPH followed by case and number tags in subscript capitals with full stops 
between each tag type. Note that, as with nouns, singular is default and is not 
tagged. The unstressed anaphoric particle, which has the forms side, sidi, ade, 
de, adi, di, is only glossed with the tag ANAPH. 


(6) a. trisodin 
through=ANAPH ec 
b. achotlud adi 
his=sleepyoy ANAPH 


Prepositional pronouns are glossed with the translational equivalent of the basic 
preposition followed by tags for person, number, gender, and case (in that order) 
in subscript small capitals. Tags for gender and case are separated from the tags 
for person and number with a full stop. The case tag is only used to disambiguate 
between the two possible cases (accusative and dative) governed a subset of 
prepositions which can govern both of these cases. If the preposition only ever 
governs one case, the case is not indicated in the glossing. 


(7) a. dóib 
tO3p, 
b. foir 
Oll3sc.MAsC.ACC 
c. for 


ON3s¢.masc.DAT 


Verbs are glossed with their translational equivalent and followed by abbrevia- 
tions in subscript small capitals for agreement, tense, mood, passive and relative 
(in that order) with a full stop between each abbreviation. The abbreviations 
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used are listed in (8). Note that indicative mood is here conceived of as the de- 
fault and is not glossed. 


(8) a. Tense: pres (present), mpr (imperfect), pst (past, only in past subjunc- 
tive), prer (preterite), pur (future). 

. Mood: sup; (subjunctive), «5 (conditional), mpy (imperative). 

. Passive forms are tagged 195; relative forms are tagged 4. 


. Agreement: is, 256 396 198 2PL» 3Pr- 
. The augment is tagged AUG or aye (see below). 


0 0 coc 


The sequence of glosses in verbs and examples of the method of glossing is 
given in (9). AUG has two positions. If it is the first preverb in the verbal com- 
plex it is treated as a PV (see below), consider (9a). If it is not the first preverb 
in verbal complex, it is glossed as in (9c). 


(9) a. ro-berthae 
AUG-bringssc psr.sugj.pAss 
b. berthar 
bring3sc.pres.supy.Pass.REL 
c. inroigrainn 
PV-persecuteaye.3sc.pret 


For compound verbs, the lexical preverb is glossed separately as PV in capitals. 
Preverbs are separated from verbal roots by a raised dot in the glossing, even 
when the dot does not appear in the quoted example. Where present, infixed 
pronouns (glossed as 1SG, 2SG, 3SGy4sc, 3SGrem, 3SGyeur, IPL, 2PL, 3PL) are inserted 
after the PV (or AUG) after a hyphen. If relevant the class type is added in pa- 
rentheses in superscript afterwards (e.g. 3SGyeur(a), 35Gneur(p), 3SGneur(c)). The hy- 
phen is also used for the infixed relative, which is glossed REL, in prepositional 
relatives at after the preverbs imm and ar. 

Consonant mutations play an important role in all Insular Celtic languages. 
In Od Irish, there are two prominent ones: lenition and nasalization. Lenition 
causes an initial stop to become a fricative; nasalisation causes initial voiceless 
stops to become voiced and prefixes a homorganic nasal to initial voiced stops 
and vowels. The mutations are glossed as superscript "EN and "^5 respectively 
before the mutated form. Examples that follow these rules are given in (10). 


(10) a. as-beir 
PV-Say3sc.pres 


XIV —— Overview of linguistic annotation 


b. at-beir 
PV-3SGyeur'SAY3sc.pres 
c. as-mbeir 
Py Say sc nie 
d. rondasaibset 
AUG-"^53sc,..-perVertsp; pret 
e. immetét 
PV-REL-surround3g¢.prrs 


Old Irish possesses a series of pronominal clitics that serve, roughly speaking, 
to emphasise items to which they cliticise. In traditional Irish grammar, these 
are called notae augentes. They are glossed with 1SG, 2506, 3SGy4sc, 3SGrrms 
3SGyeur, IPL, 2PL, 3PL. These abbreviations are not in super/subscript. They are 
separated from the glosses for the stressed word with an equal sign (=) as in 
(11); see also below. 


(11) as-beir=som 
PV-SaY35c.pres=35Gmasc/neur 


The example itself is presented using the editorial conventions of the edition 
cited. For example, if the edition does not use a raised dot to separate preverb 
from root, or a hyphen or equals sign to separate a nota augens from the verb, 
these are not inserted into the main text of the example. Punctuation is only 
inserted into the gloss as in (12). 


(12) asbeirsom 
PV- SaY3sc.pres=3SGmasc/ NEUT 


In the gloss, an equal sign is used to separate an unstressed element from a 
stressed element (13), when the two are not separated by a space in the edition 
cited. A hyphen is used to separate an unstressed element from another un- 
stressed element (14). A period is inserted between the words of translational 
equivalents where these consist of two or more words (15). An underscore is 
used between two possibly stressed items that are written without separation in 
the example (16). 


(13) isuidiu 
in=ANAPHpar 


2 Glossing of Brittonic examples —— XV 


(14) a. arní 
for-NEG 
b. donaibferaib 
for-thepar.prmasc=M€Npar.pr, 


(15) mórabba 
great.cause,«. 


(16) ísíu 
DEICT. యకం 


Note that (16) shows that the deictic particle i is glossed as DEICT. The negative 
particles are glossed NEG (main clause ni), NEGSUB (non-main clause na/nach/ 
nad) in subordinate non-relative clauses and NEGREL in relative clauses. 


2 Glossing of Brittonic examples 


The glossing of Brittonic examples is somewhat different from the glossing of 
Old Irish. These differences are exemplified below. 
Nouns and adjectives are glossed with their translational equivalent only.’ 


(17) a. gwin 

wine 

b. riuedi 
numbers 


(18) margh uskis 
horse swift 


The definite article is glossed as DEF. 


(19) r llys 
DEF court 


1 Very occasionally, subscript small capital p; is used to disambiguate a plural form of an ad- 
jective from a non-plural form (e.g. Welsh eraill is glossed other). Certain numerals have fem- 
inine and masculine forms. These are distinguished with subscript small capital fem and masc» 
(e.g. tri threeyasc vs tair threeggy). 
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All pronouns in Brittonic are tagged with the appropriate agreement tag (156, 
2506, 356, 1PL, 2PL, 3PL) and, if necessary, the following tags in subscript capitals: 
maso FEM» poss (possessive), rx (infixed) wrs (intensifier), geri (reflexive). 


(20) a. y penn 
3SGmasc.poss head 
b. a ’e lladwn ef. 
PTCL కల killise.susy.impr 3SGmasc 
c. dy hun 
25635 
d. dy hun 


All demonstratives in Brittonic are tagged as either DIST (distal) or PROX 
(proximal). 


(21) a. henna 
DIST 
b. an den ma 
DEF man PROX 
c. hynny 
PROX 


Verbs are glossed with their translational equivalent and followed by abbrevia- 
tions in subscript small capitals for agreement, tense, mood, and impersonal 
(in that order) with a full stop between each abbreviation. The abbreviations 
used are listed in (22). Note that indicative mood is here conceived of as the 
default and is not glossed. 


(22) a. Agreement: isc, »sc; 3sc» 1p» 2PL» 3PL* 
b. Tense: pres (present), prer (preterite), pur (future), mpr (imperfect), 5». 
(pluperfect), nas (habitual). 
c. Mood: suy (subjunctive), cox» (conditional), mpv (imperative). 
d. mps (impersonal) 
e. The perfective particle re, ry, 7 (etc.) is tagged PERF. 


The sequence of glosses in verbs and examples of the method of glossing is 
given in (23). 


3 List of abbreviations —— XVII 


(23) a. ledy 

Killose pres 

b. deuthant 
COME3p,. pret 

c. lladwn 
Killisc.impr.susy 

d. wnathoed 
dOssc.prpr 

e. bythynt 


Þespr.naB 


The particle ym- (also spelled em-) is glossed PV. This is separated from verbal 
roots by a raised dot in the glossing. Infixed pronouns (glossed as 1SG,,, etc.) 
are separated from the verb and supporting particles by whitespace. Examples 
that follow these rules are given in (24). 


(24) a. ym-dodant 
PV-melts; pres 


b. re gowsys 
PERF:speakssc puer 
c. ny 's gwna e hun 


NEG 3SGmasc.nr MAKE356.pres ల 


Other verb-related glosses are: vs (verbal noun), PST-PTCPL (past participle), 
PTCPL (participle), all subscript small capitals. 

Negative particles are glossed NEG, with subscript SUB used for the subordi- 
nate negative, where necessary. The predicative particle (yn in Welsh) is glossed 
PRED. The progressive particle (ow in Cornish) is glossed PROG. Other particles 
are glossed PTCL. 


3 List of abbreviations 


1 1st person 

2 2nd person 

3 3rd person 

A Class A pronouns 
ACC Accusative 
ANAPH Anaphor 

AUG Augment 


B Class B pronouns 
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NEG 
NEUT 
NOM 
PASS 
PERF 
PL 
PLPF 
POSS 
PRED 
PRES 
PRET 
PROG 
PROX 
PST 
PST-PTCPL 
PTCPL 
PV 
REFL 
REL 
SG 
SUB 
SUBJ 
VN 


Class C pronouns 
Conditional 

Dative 

Definite 

Deictic particle í 
Distal Demonstrative 
Feminine 

Future 

Genitive 

habitual 

Imperfect 
Impersonal 
Imperative 
Infinitive 

Infix 

Intensifier 
Lenitition 
Masculine 
Nasalization 
Negation 

Neuter 

Nominative 
Passive 

Perfect 

Plural 

Pluperfect 
Possessive 
Predicative Particle 
Present 

Preterite 
Progressive 
Proximal Demonstrative 
Past (Subjunctive) 
Past passive participle 
Participle 

Preverb 

Reflexive 

Relative 

Singular 
Subordinate (Negative) 
Subjunctive 
Verbal Noun 


Elliott Lash, Fangzhe Qiu, and David Stifter 
Introduction: Celtic Studies and Corpus 
Linguistics 


1 Background to the volume 


This volume is a collection of eleven chapters that showcase the state of the art in 
corpus-based linguistic analysis of the old, middle and early modern stages of 
Celtic languages (specifically, Old and Middle Irish, Middle Welsh, and Cornish). 
The contributors offer both new analyses of linguistic variation and change as well 
as descriptions of computational tools necessary to process historical language 
data in order to create and use electronic corpora. On the whole, the volume repre- 
sents a platform for the exploration of corpus approaches to morphosyntactic vari- 
ation and change in the Celtic languages and, for the first time, situates Celtic 
linguistics in the broader field of computational and corpus linguistics. 

These chapters were originally prepared for lectures hosted by the 
Chronologicon Hibernicum project (ChronHib), an ERC-funded project at 
Maynooth University, Ireland (ERC Consolidator Grant 2015, H2020 #647351). 
The lectures occurred at three separate workshops (December 15, 2016, April 4, 
2017, October 13-14, 2017), which brought together an international group of re- 
searchers with various backgrounds to help the ChronHib team gain insight into 
preparing linguistically marked-up text for statistical research on language varia- 
tion in Old Irish. At the first event, all aspects of corpus building and use, such as 
morphological tagging, syntactic parsing and maintenance and sustainability 
of online databases, were discussed. In subsequent events, two main themes 
emerged: first, the necessity of developing computational tools such as mor- 
phological taggers/analysers and lemmatisers, and second, that careful use of 
corpora with a focus on new search queries yields progress on previously in- 
tractable problems of Celtic morphosyntax. 


2 ChronHib and CorPH 


The overall goal for ChronHib is to develop a statistical methodology of lin- 
guistic dating in order to more precisely date the diachronic development of 
the Early Irish language (Old Irish: seventh to ninth century, Middle Irish: 
tenth to twelfth century) and thereby to predict the age of the large number 
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of anonymous, dateless Irish texts. In many ways, too, the early stages of 
Brittonic languages present the same problems of anonymous, as yet un- 
dated text (Rodway 2013). In traditional studies of both Goidelic and 
Brittonic material, linguistic dating has typically been a matter of philologi- 
cal and linguistic analysis of manually curated data. ChronHib aims at ad- 
vancing the methods used for linguistic dating of Early Irish by contributing 
to a chronologically more precise description of linguistic variations and by 
employing corpus linguistic and advanced statistical methods. It also en- 
deavours to improve, by means of digital humanities techniques, on the 
availability and reliability of the material basis relevant to the chronology of 
linguistic developments and of the literature of early medieval Ireland (see 
Qiu et al. 2018 for a more in-depth discussion of ChronHib). 

Essentially, ChronHib will produce a new linguistically tagged corpus of Old 
Irish texts. This corpus, called the Corpus Palaeohibernicum (CorPH, Stifter et al. 
2015-) is in the development stage and will soon be freely accessible online. It 
will, firstly, unify some of the existing resources for the study of Old Irish texts 
under one annotation scheme, and secondly, expand the amount of electronic ma- 
terials by digitising and annotating data that have only been available previously 
in printed media or manuscripts. Scholars working on Old Irish, for example, 
have, until now, mainly relied on the data found in the two-volume printed edition 
of Thesaurus Palaeohibernicus (Thes. = Stokes and Strachan 1901-1910). The exist- 
ing digital resources for medieval Irish texts come in a variety of forms: annotated 
lexicons, digital glossaries, text with XML markup, treebanks, and fully digital dic- 
tionaries. For extensive discussion of some of these materials, see Griffith, Stifter, 
and Toner (2018). These heritage data together constitute the corpus on which the 
contributions in this volume are based, and a brief description of them is pertinent 
here. 

The main online dictionary of Early Irish is eDIL (Toner et al. 2019). It enables 
research into semantic, morphological, and syntactic usage of Irish lexemes in sour- 
ces written between the seventh century and 1700. There are, in addition, two major 
digital collections of early Irish texts: the Corpus of Electronic Texts (CELT) hosted by 
University College Cork (Farber 2012) and the Thesaurus Linguae Hibernicae (TLH) 
hosted by University College Dublin (Kelly and Fogarty 2006-2011). These corpora 
consist of analytically and structurally XML-marked up texts following the TEI 
guidelines. The usefulness of these textual resources for the corpus-linguist is only 
indirect, since no linguistic information is tagged. A prominent treebank is the 
Parsed Old and Middle Irish Corpus (Lash 2014a), a UPenn-style syntactically tagged 
treebank of fourteen Old Irish texts. The two online annotated lexicons are the 
Milan Glosses database (Griffith and Stifter 2013) and the Priscian Glosses database 
(Bauer 2015; see also Bauer, Hofman, and Moran 2018). These are fully annotated 
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for morphological and lexical information. Griffith and Stifter’s (2013) database con- 
sists of around 50,000 morphologically and POS-tagged tokens from the Old Irish 
glosses in the Milan manuscript Ambr. C301 infr. (Ml.). Bauer's (2015) database con- 
sists of around 20,000 morphologically and POS-tagged tokens from the Old Irish 
glosses in several manuscripts of Priscian's Institutiones Grammaticae, with the St 
Gall Stiftsbibliothek manuscript 904 (Sg.) containing the most extensive collection of 
these glosses. These two databases, along with the Lexicon of the Old Irish glosses in 
the Würzburg manuscript of the Epistles of St. Paul (Wb.; Kavanagh 2001, available 
in print and .pdf formats), have been the catalyst for much research into linguistic 
variation in Old Irish over the past eighteen years. 

The above databases (Ml., Sg.) and lexicon (Wb.) were used by most of the 
contributors in the present volume who studied variation in Old Irish in contem- 
porary (eighth to ninth century) manuscripts. Moreover, many of the texts dis- 
cussed in Liam Breatnach's and Christopher Yocum's contributions can be 
found in the CELT and TLH corpora. The Ml. and Sg. databases have now been 
incorporated into CorPH and stand beside other resources specifically made for 
CorPH such as the Minor Glosses database (Lash 2018), the Annals of Ulster data- 
base (Qiu 2019), and the Poems of Blathmac database (Barrett 2018a) In total, 
CorPH has over 120,000 fully annotated tokens of Old Irish text in various genres 
(glosses, annals, poetry, chief among them) and will allow researchers easy ac- 
cess to a large amount of data for research on linguistic variation. Some chapters 
in this volume (for example, Elisa Roma's and Theodorus Fransen's) have al- 
ready made use of data from CorPH. 

For the other well-attested medieval Celtic language, Middle Welsh 
(c. 1150-1500), authoritative editions have long served as the standard corpus for 
scholars. Meanwhile, two online, searchable corpora have been published, cover- 
ing the majority of prose texts surviving from before 1425: Rhyddiaith Gymraeg o 
Lawysgrifau'r 13eg Ganrif (Isaac et al. 2013) and Rhyddiaith Gymraeg 1300-1425 
(Luft, Thomas, and Smith 2013). These form the basis of Britta Irslinger's investi- 
gation in this volume, and a more detailed description can be found in that contri- 
bution. The late medieval and early modern period of the Welsh language is 
represented by the Corpws Hanesyddol yr Iaith Gymraeg 1500—1850 (Willis and 
Mittendorf 2004), which contains about 420,000 words from 30 texts in a variety 
of genres. However, these corpora have not been linguistically tagged and there- 
fore their usefulness is somewhat limited. The contribution by Marieke Meelen 
aims to tackle this lacuna by developing tagging methods for part of the prose 
corpora mentioned above. The last medieval Celtic language dealt with in this 
volume, Cornish in its middle (c. 1200-1600) and late (c. 1600-1750) phases, 
survived mainly in versified religious plays and translated works, scholarly 
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editions of which constitute the corpus for the analysis in Joseph Eska and 
Benjamin Bruch’s contribution. 


3 Overview of themes 


Digital corpora for medieval Celtic languages have certainly become a central 
part of the field of Celtic Studies in recent years but fully annotated corpora are 
still few in number and the application of computational linguistic methods in 
the analysis of Celtic languages is in its infancy. These languages represent a 
new frontier in the development of natural language processing tools, in part 
because they pose special challenges, such as complicated inflectional mor- 
phology with non-straightforward mappings between lemmata and attested 
forms, highly variable orthography, and initial consonant mutations. With so 
much data available in non-electronic form as the result of previous work and 
ongoing efforts to convert these data to computer-readable format, it is not sur- 
prising to find that the contributors employ both available digital corpora and 
printed editions or manuscripts in their research, and that quantitative studies 
are more often conducted in a data-based or data-inspired rather than data- 
driven manner. This approach shows great potential in revealing hitherto sub- 
tle generalisations over various aspects of medieval Celtic languages. 

A significant aspect of the volume is that the quantitative studies all deal 
with aspects of syntactic structure, a subsection of the grammar of medieval 
Celtic languages (Irish in particular) that has suffered relative neglect, in favour 
of investigations focusing on phonology and morphology. Happily, more work 
on syntax has appeared since Isaac (2003) gave a short survey of the few works 
in the field and pronounced a handbook of Old Irish syntax to be a desidera- 
tum. Much of the work of the last decade and a half (e.g. Garcia-Castillero 2013; 
Griffith 2008; Lash and Griffith 2018; Roma 2014) draws directly on the increas- 
ing availability of searchable corpora that enable easy access to the fundamen- 
tal dataset. This explosion in research is set to continue with the development 
of CorPH. Bringing the results produced by central scholars participating in this 
endeavour together in one place emphasises the potential that corpus ap- 
proaches have in aiding research and underlines many points in need of further 
investigation. 

With its concentration on computational corpus linguistics and morphosyn- 
tactic data from historical language stages, this volume is a first in the discipline 
of Celtic Studies, which has been mainly focussed on traditional philological 
work such as the editing of texts and literary/historical explication of these 
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texts. Additionally, it contrasts with and complements other recent volumes 
of interest to scholars working in Celtic Studies, such as Formal Approaches to 
Celtic Linguistics (Carnie 2011), Linguistic and Philological Studies in Early Irish 
(Roma and Stifter 2014), the proceedings of the fourth International Congress of 
Celtic Studies, held in Maynooth University, 1-5 August 2011 (Breatnach et al. 2015), 
and Centres and Peripheries in Celtic Linguistics (Bloch-Trojnar and Ó Fionnáin 
2019). While each of these volumes consist of chapters analysing various stages of 
the Goidelic and Brittonic languages, very few use corpus data or deal with prob- 
lems of corpus building. Moreover, many of these contain chapters that are more 
philological, historical, or literature-oriented than strictly linguistic in nature. The 
present volume, in contrast, reflects the increasing awareness of the usefulness of 
corpus data in Celtic linguistics, and its contributions show how corpora of Celtic 
languages can be most effectively constructed and exploited. In the meantime, 
scholars who focus mainly on philology should still find many of the chapters in- 
teresting, as they contribute to our knowledge of the grammars of medieval Celtic 
languages from fresh perspectives. It is also hoped that chapters such as Marieke 
Meelen's and Theodorus Fransen's, which showcase the development and testing 
of new computational tools for Celtic language data will also appeal to linguists 
in general, especially those who are interested in diachronic linguistic changes, 
computational linguistics, and corpora of historical languages. 


4 Description of chapters 


The volume is divided into two thematically distinct but related parts. Part one 
consists of four chapters dealing with the design and creation of corpora for his- 
torical languages generally and Celtic languages in particular. Part two consists 
of seven chapters that are broadly united by the theme of description and quali- 
tative/quantitative analysis of linguistic data derived from the available cor- 
pora of medieval Celtic languages. The division into two main parts is motivated 
by thematic concerns, since the contributions fall into two general groups. There 
are, firstly, detailed technical discussions of corpus construction, automatic anno- 
tation tools, and clustering methods (Marius Johndal, Theodorus Fransen, Marieke 
Meelen, and Christopher Yocum's chapter), and secondly, primarily corpus-based 
analyses of particular phenomena (Liam Breatnach, Carlos García-Castillero, Jürgen 
Uhlich, Elisa Roma, Aaron Griffith, Joseph Eska and Benjamin Bruch, and Britta 
Irslinger's chapter). The first part of the book is therefore, roughly speaking, practi- 
cal with its concentration on computational research tools and methods, while 
the second is analytical in focus. 
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Within each part of the book, chapters are themselves grouped thematically. 
Part one begins with two chapters (by Marius Johndal and Marieke Meelen, re- 
spectively) that originate from discussions at the first and second ChronHib 
workshops about the building and sustainability/maintenance of linguistically 
annotated corpora. Additionally, as a description of a new Welsh treebank, 
Meelen’s chapter responds to some of the concerns about the need for better 
ways of doing research on problems of Celtic syntax, as was expressed by partic- 
ipants at the second and third ChronHib workshops. The next two chapters in 
part one concentrate on the creation and use of computational tools in order 
to analyse particular aspects of the Old Irish corpora (verbal morphology in 
Theodorus Fransen’s chapter and stylistic clustering in Christopher Yocum’s 
chapter). 

Part two begins with two chapters (by Liam Breatnach and Carlos Garcia- 
Castillero, respectively) that investigate the diachronic syntax and morphology 
of pronouns and demonstratives in Old Irish The following three chapters (by 
Elisa Roma, Jiirgen Uhlich, and Aaron Griffith, respectively) are all united 
through their investigation of grammaticalised consonant mutations in Old 
Irish, whether in the context of relative clauses (Griffith and Uhlich) or after 
nominals (Roma). The final two chapters in part two (by Joseph Eska and 
Benjamin Bruch on the one hand and Britta Irslinger on the other) deal with 
some syntactic phenomena in the Brittonic languages. 


4.1 Description of Part 1 


Marius Jghndal’s “Treebanks for historical languages and scalability” presents 
both a general overview of the motivations for and practice of corpus building as 
well as a detailed overview of the PROIEL family of treebanks. This group of 
treebanks includes annotated texts from older Indo-European languages and 
is one of the most ambitious recent corpus-related projects for these lan- 
guages. It includes the original core, the PROIEL (Pragmatic Resources in Old 
Indo-European Languages) itself, which is a corpus of New Testament texts in 
Ancient Greek, Latin, Classical Armenian, and Gothic, as well as some other 
texts in some of these languages. Additionally, the PROIEL family also includes 
the ISWOC Treebank, consisting of texts in Old English and Old Romance 
(Spanish, Portuguese), and the TOROT database with texts in Old Slavic (Old 
Church Slavonic, Old Russian). One of the goals of the chapter is the introduc- 
tion of a new interface for browsing and searching the PROIEL Treebank and 


related treebanks called Syntacticus ([http://syntacticus.org). This expansion of 


the PROIEL family of treebanks increases its visibility and is a crucial way of 
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achieving long-term maintenance. It is also an exemplary open-source infra- 
structure that can be used for future projects. The chapter is therefore program- 
matic and practical, since the kinds of technical, linguistic, and manpower 
related challenges it describes serve as both a guideline to best practice and an 
inspiration for future research on Celtic languages. Although the chapter does 
not discuss Celtic languages in particular, in many respects it sets the tone for 
the volume since many of the issues mentioned in it, being characteristic of less- 
resourced historical languages, will be familiar to scholars of medieval Celtic 
languages and it is hoped that the chapter may serve as a call to collaboration. 

“Annotating Middle Welsh: POS tagging and chunk-parsing a partial corpus 
of native prose” by Marieke Meelen demonstrates the process of creating an an- 
notated corpus of some Middle Welsh native prose (as against translated works), 
and the challenges and potentials of building such a corpus. The corpus contains 
only literary narratives and some law texts at present but will be extended to 
other genres and registers. Digitalised texts were pre-processed with punctuation 
and tokenisation, which was done automatically by a POS tagger and a Memory- 
Based Tagger. The text was then marked up with a simplified version of the TEI 
P5 header. The author adopts the UPenn annotation scheme modified with 
Welsh-specific tags that enable further queries concerning agreement patterns 
and change in Information Structure. A Memory-Based Tagger assigns morpho- 
syntactic tags to tokens automatically and a modified rule-based chunk-parser is 
deployed to annotate syntax and information structure. This chapter presents the 
first systematic approach to annotating historical Welsh, and the corpus it de- 
scribes ultimately aims to provide a starting point to build a fully annotated 
Welsh historical treebank. 

In “Automatic morphological analysis and interlinking of historical Irish cog- 
nate verb forms”, Theodorus Fransen describes a computational approach to un- 
derstanding how the Irish verbal system develops diachronically. The author's 
major contribution is to propose a morphological analyser for Old Irish verbs and 
to discuss ways this analyser can be incorporated into a framework of computa- 
tional resources for various stages of Irish. This proposal dovetails with Johndal's 
and Meelen's chapters in dealing with ways of expanding the current computa- 
tional toolset for a historical language (specifically historical stages of Irish) and 
in its concerns with scalability. These concerns are reflected in his detailed inves- 
tigation of the challenges encountered by a methodology that incorporates finite- 
state morphology as it applies to Old Irish. The challenges he details are two- 
fold. The first challenge has to do with word and morpheme division as encoun- 
tered in "real" text, i.e. editions or manuscript transcriptions. In many cases, 
multiple morphemes may be written as a concatenated string, resulting in the 
need to find a way to encode licit combinatorial possibilities of multiple 
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morphemes. This is a so-called generation problem, where generation means 
the ability of the analyser to generate all and only the licit inflected forms of 
any given stem. In other cases, whitespace is found between morphemes lead- 
ing to potential parsing ambiguities since the analyser is word-based (where a 
word is understood to be an element between whitespace). This is a so-called 
analysis problem, which may result in the wrong morphological tag being as- 
signed to any given string. The second challenge has to do with the complex 
interaction between phonology (especially stress) and morphology in Old 
Irish since stress alternations can result in syncope and the presence or ab- 
sence of palatalisation of stem-final and ending-initial consonants. These 
challenges impinge on the choices made for implementing the finite-state 
transducer. For instance, does one rely on a strictly rule-based approach to 
specify certain licit combinations and handle stem variants induced by stress 
alternations, using “flag” morphemes or upper-level filters for instance to 
deal with the generation problem? Or does one hard-code (i.e. list) such stem 
variation or parts of paradigms? Fransen carefully weighs the advantages of 
different approaches in order to ensure the applicability of his analyser. He 
also envisions a fully functioning POS-tagger suitable for both Old and Middle 
Irish by making some suggestions for allowing interoperability of resources, 
especially between his morphological analyser and Dereza’s (2018) Old Irish 
lemmatiser. 

Christopher Yocum’s chapter “Text clustering and methods in the Book of 
Leinster” uses machine-learning techniques to cluster the texts in the Book of 
Leinster (LL), and tries to identify the reason for the clustering. The author ex- 
tracts individual texts from the electronic edition of LL, tags the function words 
and calculates the frequency of function words in each text. The frequencies 
are then turned into a matrix of vectors, which goes through the k-medoids al- 
gorithm, subject to normalisation and “Principal Component Analysis”. The re- 
sult is a clustering scatter plot. The clustering can be caused by the variables of 
author, scribe or genre, and these three factors are tested in turn. The result 
suggests that authorship is the main factor in clustering, and that the tradi- 
tional ascriptions to certain authors do not fit the clustering and may need to 
be revised. The methods used are innovative within Celtic Studies and contrast 
with the traditional philological approach to text clustering. The chapter is a 
useful addition to the large body of work on the history of the manuscript and 
the clusters of text reported on deserve further investigation. If specific linguis- 
tic usages can be associated with particular clusters, this may be useful for the 
study of idiolect/style at particular periods. 
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4.2 Description of Part 2 


In “The demonstrative pronouns in Old and Middle Irish”, Liam Breatnach uses a 
corpus of Old Irish verse texts that are largely available online in TLH and CELT. 
The author first observes that there is a split between the unstressed enclitic de- 
monstrative particles -sin ‘that’, -so/se ‘this’ and their stressed pronominal var- 
iants, sin ‘that’, s6/sé ‘this’ (dative sund/siu). The rest of the chapter deals with a 
diachronic investigation of the morphophonology, syntax, and semantics of the 
stressed demonstrative pronouns. The results of this investigation map the distri- 
bution of demonstratives according to four main features: syntactic function, sin- 
gular/plural number, inanimate/animate reference and period (i.e. Old versus 
Middle Irish). The main contribution of the chapter is that it highlights subtle dif- 
ferences between Old and Middle Irish usages. First, while the stressed demon- 
stratives on their own (without the addition of the particle ౧ could be construed 
as plural in both Old and Middle Irish, plural reference was very restricted in Old 
Irish, but much expanded in Middle Irish. Specifically, plural reference is found 
in Old Irish when the demonstrative acts as a subject of a copular sentence and 
in later Old Irish as the complement of an agreeing preposition. Middle Irish al- 
lows plural reference in some other contexts. Second, demonstratives with inani- 
mate and animate reference are likewise found in both Old and Middle Irish, but 
animate reference in Old Irish once again is restricted to subjects of copular sen- 
tences whereas it is found in other contexts in Middle Irish. The chapter closes 
with some discussion of the possibility that the independent, personal pronoun 
sé ‘he’ developed during Middle Irish from the demonstrative sé in contexts 
where it had animate reference. 

Carlos Garcia-Castillero’s chapter is titled “Paradigmatic split and merger: The 
descriptive and diachronic problem of Old Irish class B infixed pronouns”. This 
contribution replaces Garcia-Castillero’s lecture “Synonymy (a^ / aní ‘that (what)’, 
a^ / inta(i)n ‘when’) and homonymy (a ‘that (what)’ and a" ‘when’) in the Old 
Irish glosses" presented at the third workshop, because the author had already 
submitted the lecture for publication elsewhere. The contribution in this volume 
explains the diachronic origin of the Old Irish class B infixed pronouns, which are 
used in a declarative clause after pretonic lexical preverbs of the structure (-)VC-. 
The author firstly clarifies the relevant notions in Old Irish (clause types, verbal 
complex, phonotactic structure of preverbs, etc.), and then illustrates the use of 
non-third person infixed pronouns with instances collected from the corpus of the 
contemporaneous Old Irish glosses. This corpus-based approach yields the inter- 
esting observation that, in the language of the contemporaneous Old Irish texts, 
non-third person infixed pronouns are much less regular than the third person in- 
fixed pronouns in making a distinction between declarative and relative forms, 
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especially when the lexical preverb after which the infixed pronoun appears is of 
type (-)VC-. Such asymmetry in distribution between the persons raises a question, 
which, in the author’s opinion, is directly related to the diachronic origin of the 
class B infixed pronouns. The author argues that class B infixed pronouns arose to 
distinguish a verbal complex with a third person singular masculine or neuter in- 
fixed pronoun in a declarative clause from a complex without an infixed pronoun 
in a relative clause. More specifically, a process of morphological split in the origi- 
nal class C paradigm has given rise to two forms in the third persons, and tenta- 
tively in the other persons. 

Elisa Roma presents her findings on the distribution of nasalisation after 
nominals in Old Irish glosses in “Nasalisation after inflected nominals in the 
Old Irish glosses: Evidence for variation and change”, where her main interest 
lies in the possibility of mapping variation in nasalisation to chronological or 
diatopic criteria. All instances of nasalisation after nominals from four Old Irish 
corpora of glosses have been collected (Wb., Ml., and Sg. and the Minor Glosses 
Database). The phonetic contexts for nasalisation are categorised, as well as 
the word class of the nasalising/nasalised word. The frequency of nasalisation 
in each combination of phonetic context and word class has already been re- 
ported in Roma (2018a). Firstly, the data show that the absence of nasalisation 
after inflected nominals in Old Irish cannot be due merely to the loss of a nasal 
consonant in consonant clusters. Secondly, individual texts show different fre- 
quencies of nasalisation in the same context. The variation between Old Irish 
texts in nasalisation after inflected nominals suggests not only diachronic 
strata but also probable regional differences that led to later developments in 
Modern Irish and Scottish Gaelic. The chapter is comparable to other corpus- 
based investigations of morphophonology, such as Griffith (2016a) and Lash 
(2017a). Together with these papers, Roma’s chapter is illustrative of the impact 
lexicons and corpora have had on Celtic linguistics. 

In “On the obligatory use of a nasalising relative clause after an adjectival 
antecedent in the Old Irish glosses”, Jiirgen Uhlich uses a corpus consisting of 
the main Old Irish glosses (Wb., Ml., Sg.) to explore the extent to which adjec- 
tives having a modal adverbial reading must be followed by a nasalising rela- 
tive clause in cleft sentences (e.g. arndip maith nairlethar a muntir ‘so that he 
may well order his household’, lit. ‘that it may be good how he orders’). The 
author argues that, save for some well-defined exceptions, the nasalising rela- 
tive clause is an absolute prerequisite of this construction. His approach is at 
once quantitative, since he has systematically and exhaustively collected all in- 
stances of modal adjective cleft sentences from the glosses and studied their 
distribution, and qualitative, since he also carefully establishes and describes 
the varying types of “exceptions” to the generalisation. The exceptions to the 
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generalisation include (a) cases in which the verb in the clause following the 
adjective has an object marked with a class A or B infixed pronoun, (b) instan- 
ces of mixed antecedents in coordination where the antecedent farthest from 
the embedded clause is the modal adjective, (c) clauses involving what Uhlich 
terms “syntactic raising”, essentially multiple dependencies, where the modal 
adjective and another constituent simultaneously act as the antecedent to the 
embedded clause, and (d) some possibly innovative instances of leniting 
rather than nasalising relative clauses. The paper is an important contribution 
to a long-standing debate in Old Irish studies dealing with the rather complex 
syntax of relative clauses and its conclusion that a nasalising relative clause 
is an essential component in a modal adjective cleft revises the previous con- 
sensus that nasalising relative clauses were optional across much of the do- 
main in which they could be used. 

In Aaron Griffith’s chapter, “The ‘Cowgill particle’, preverbal ceta ‘first’, and 
prepositional cleft sentences in the Old Irish glosses”, he connects what he calls 
“three seemingly unrelated” phenomena: the phonological shape of the adverbial 
preverb ceta ‘first’, evidence for the so-called Cowgill Particle (*eti), and the usage 
of relative verbs in PP-clefts. The author investigates both the first and second 
vowel in ceta using a combination of a quantitative corpus-based approach and a 
qualitative comparative approach. In his discussion of the variation in the initial 
syllable of ceta (attested as both ceta and cita), he shows that the usage of the 
i-variant increases over time. He then argues that the final vowel of ceta, to- 
gether with the final vowel of the preverb ocu (in ocu-ben) could provide further, 
previously unexamined, evidence for the Cowgill Particle, if the initial vowel of 
*eti was not elided after preverbs ending in u (i.e. *kintu-eti, not *kintu-ti > ceta, 
*onku-eti, not *onku-ti > *ocu). Because the preverb ceta is predominately found 
in relative clauses, where the Cowgill Particle would in fact not be expected, the 
paper then shifts to a discussion of two examples in which a verb containing 
ceta is arguably non-relative. These two examples are both prepositional cleft 
sentences (e.g. ar is do thabirt diglae berid in claideb sin ‘for it is to wreaking re- 
venge that he carries that sword’), where a non-relative verb typically follows the 
prepositional phrase (PP). The author surveys the evidence for PP-clefts in the 
corpus of glosses and shows that, despite the general rule, the Milan Glosses 
have innovative relative verbs after the PP. While this leaves the status of the two 
examples containing ceta uncertain (they could either be non-relative, and there- 
fore evidence for the Cowgill Particle, or relative), the chapter is, like Uhlich’s, a 
useful contribution to the perennial debate on the syntax of cleft sentences and 
relative clauses in Old Irish. 

Britta Irslinger, in “The functions and semantics of Middle Welsh X hun(an): 
a quantitative study", uses two untagged corpora of Middle Welsh — Rhyddiaith 
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Gymraeg 1300-1425 / Welsh Prose 1300-1425 and Rhyddiaith y 13eg Ganrif: 
Fersiwn 2.0 - to investigate an innovative usage of the collocation X hun(an) 
(where X is a possessive pronoun) as a reflexive pronoun in Middle Welsh. The 
author shows that the collocation X hun(an) was generally used as an intensifier 
in the corpora, in a manner similar to English myself in I saw him myself, but 
there is some evidence of its grammaticalisation as a reflexive pronoun. This new 
function of X hun(an) appears in fourteen instances out of a total of 1908 unique 
tokens of X hun(an), where it is used instead of the usual reflexive markers, the 
verbal prefix ym- or plain pronouns. The fourteen examples of reflexive usage 
come from translation literature, but it does not appear that the collocation X 
hun(an) corresponds to any particular intensifier marker in the base language. 
This suggests that the examples display a real innovation in Welsh grammar. The 
study is part of an ongoing effort (see references cited in the chapter) to under- 
stand the expression of reflexivity, reciprocal action, and middle voice in Welsh 
and also contributes to the debate over the extent to which English -self as an 
expression of reflexivity arose as the result of contact with Welsh. According to 
the author, the use of -self as a reflexive in English expanded from the mid- 
twelfth to the seventeenth century. Although this is not explicitly stated by the 
author, the fact that there are so few examples of X hun(an) used as a reflexive 
before 1425, i.e. after the first signs of the innovation in English, could suggest 
that the contact with Welsh was not the only factor in the development of -self. 

In “Prolegomena to the diachrony of Cornish syntax", Joseph Eska and 
Benjamin Bruch discuss the diachronic development of the configuration of the 
Cornish affirmative root clause with comparison to other Brittonic languages. 
Since verbal sequences do not occur in Old Cornish, examples from Old Welsh 
and Old Southwest Brittonic, showing VSO and V2 orders, are quoted, with the 
assumption that these languages behaved similarly to Cornish. The affirmative 
root clauses in Middle Welsh and Middle Breton are generally V2, and surface 
V2 (along with V3) is also found in Middle Cornish. The authors then analyse 
the architecture of the left periphery and the preverbal Object DP, pointing out 
that the exceptions to V2 in Middle Cornish are caused by metrical considera- 
tions overriding the grammar, and despite the corpus of Middle Cornish being 
composed largely of verse, the Middle Cornish affirmative root clause was V2 of 
the “relaxed” type. The authors then examine the corpus of Late Cornish texts 
and find that these are of dubious evidential value because the corpus is very 
small and consists of translations by a native speaker and texts by non-native 
speakers. 


Part 1: Corpus tools for historical Celtic 
linguistics 


Marius ఓ. J@hndal 
1 Treebanks for historical languages and 
scalability 


1 Introduction 


Historical linguistics, whether synchronic or diachronic, is by definition based 
on corpora. Since we do not have access to the intuitions of native speakers we 
can only test linguistic hypotheses about historical languages by systematically 
collating information from our corpus of texts. 

For questions that typically concern linguists, this often means identifying 
every occurrence of a particular phenomenon in the corpus, analysing, classify- 
ing and counting the occurrences and then using this for testing hypotheses 
about the structure of the language. This can be done manually, but this is 
time-consuming and error-prone. As Haug (2015) points out, while reading the 
text and manually collating information from it is essential for hypothesis for- 
mation it is much less useful for hypothesis testing. Even if the text is in elec- 
tronic form, it is easy to overlook an example, record it incorrectly or fail to 
apply test criteria consistently over time. 

This paper focuses on treebanks, which are corpora that have been annotated 
with morphosyntactic information so that we can extract linguistic structures like 
‘verb with an accusative noun’. High-quality treebanks for a range of historical lan- 
guages now exist and are widely used in historical linguistic research. This includes 
treebanks that follow the Penn-style of annotation, e.g. the Penn-Helsinki Parsed 
Corpus of Middle English (Kroch and Taylor 2000), the Penn-Helsinki Parsed 
Corpus of Early Modern English (Kroch, Santorini, and Delfs 2004), the Penn- 
Helsinki Parsed Corpus of Modern British English (Kroch, Santorini, and Diertani 
2016), the Tycho Brahe Parsed Corpus of Historical Portuguese (Galves and Britto 
2002) and the Icelandic Parsed Historical Corpus (Wallenberg et al. 2011), as well 
as dependency-based treebanks, e.g. the Index Thomisticus (Passarotti 2007), 
the Ancient Greek and Latin Dependency Treebanks (Bamman and Crane 2011; 
Celano, Crane and Almas 2014), the PROIEL Treebank (Haug and Johndal 2008, 
Haug, Eckhoff et al. 2009), the ISWOC Treebank (Bech and Eide 2014) and the 
TOROT Treebank (Eckhoff and Berdicevskis 2015). 

A key challenge in building treebanks for historical languages is lack of re- 
sources. Funding is limited and there are few existing computational language 
resources like taggers and parsers available. At the same time, the task is com- 
plex and experts on the language have to devote a significant amount of time to 
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the annotation task. This comes on top of the complexity of designing a suitable 
annotation scheme that balances the desire to capture philological and linguistic 
detail with an approach that is reliable, scalable and technically feasible. 

A key motivator behind treebank efforts is to facilitate reuse of resources 
and to provide access to large data sets that make hypothesis testing robust 
and encourage replication of published research, but as funding for construc- 
tion of a treebank tends to be tied to a time-limited research project, it is chal- 
lenging to fulfil such long-term aspirations and achieve scale and long-term 
consistency. 

This paper describes these challenges in the context of the PROIEL, ISWOC 
and TOROT treebanks, and how this has motivated efforts to use automated tools 
like taggers and parsers to scale the annotation process. The paper also describes 
Syntacticus (http://syntacticus.org), which now serves as a shared front-end for 
PROIEL, ISWOC and TOROT, but whose long-term aim is to integrate automated 
taggers and parsers with our existing annotation tools and offer this as an open 
infrastructure platform that can be used by researchers working on other less- 
resourced, historical languages within the Indo-European family, such as the 
Celtic languages. 

Section 2 briefly introduces the PROIEL, ISWOC and TOROT treebanks and 
some key properties of the annotation scheme. Section 3 describes the chal- 
lenges involved in maintaining these treebanks, expanding them and making 
them accessible for researchers, and how this has motivated us to set up 
Syntacticus. Section 4 describes in more detail current efforts aimed at evaluat- 
ing how the annotation process can be scaled using automated taggers and 
parsers. 


2 The PROIEL, ISWOC and TOROT treebanks 


The PROIEL-family of treebanks currently includes the PROIEL, ISWOC and 
TOROT treebanks. Together they contain text samples from a number of old 
Indo-European languages (see Table 1) which, when consolidated into one tree- 
bank, contains around one million words that have been lemmatized, morpho- 
logically analysed and annotated with syntactic dependencies. 

The original PROIEL Treebank stems from a research project called Pragmatic 
Resources in Old Indo-European Languages at the University of Oslo (2008-2012), 
which was set up to study information packaging in ancient Indo-European 
languages. A major part of this was to compile a treebank containing the New 
Testament in its original and translations, as the New Testament is a natural 
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Table 1: Languages and token counts in the PROIEL Treebank release 20180408, the TOROT 
Treebank release 20180919 and the ISWOC Treebank release 20160620. 


Language Number of tokens Number of sentences Treebank 
Ancient Greek 250,455 18,173 PROIEL 
Latin 225,064 19,425 PROIEL 
Gothic 57,211 5,457 PROIEL 
Classical Armenian 23,513 1,916 PROIEL 
Old Russian 235,275 24,716 TOROT 
Old Church Slavonic 58,269 6,350 PROIEL 
Old Church Slavonic 82,007 8,371 TOROT 
Old English 29,406 2,536 ISWOC 
Old French 2,340 137 ISWOC 
Old Portuguese 36,595 2,027 ISWOC 


Old Spanish 54,661 2,615 ISWOC 


parallel text that allows for cross-linguistic comparison of phenomena like 
word order, anaphoric expressions, definiteness, background events and dis- 
course particles. 

To achieve this the New Testament texts were annotated with morphosyn- 
tactic and information-structure annotation, and then aligned so that words in, 
for example, the Vulgate were linked to the words that they translate to in the 
Greek New Testament. 

The PROIEL Treebank has since been expanded with other texts in Latin and 
Ancient Greek, which have been morphosyntactically annotated. Since the end 
of the original PROIEL project, the long-term objective has been to expand the 
treebank to the point where it contains - to the extent it is practically possible — 
representative samples from different periods and genres. This is why, for ex- 
ample, the Latin section of the treebank now includes not just the Vulgate 
and texts from the classical canon, like Caesar's Gallic War, but also works 
like the Late Latin Peregrinatio Aetheriae and sections of Palladius' agricul- 
tural handbook, and at the time of writing Petronius' Satyricon and samples 
from Plautus are being prepared. 

In parallel to the continued expansion of the PROIEL Treebank, the ISWOC 
Treebank and the TOROT Treebank were set up. The ISWOC Treebank contributes 
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samples from Old English, Old French, Old Spanish and Old Portuguese, while 
the TOROT Treebank contributes a large and expanding selection of texts from 
Old Russian and Old Church Slavonic. Both are modelled on the PROIEL Treebank 
and were designed to be fully compatible. They therefore adhere to the same anno- 
tation scheme, were built using the same annotation process and rely on the same 
data representation (Eckhoff et al. 2018). 

Using the same annotation scheme offers a range of advantages. For lin- 
guists using the treebank the main advantage is that it becomes possible to test 
cross-linguistic hypotheses, but it also significantly simplifies the process of 
building a treebank if resources can be combined to design shared guidelines 
and build shared annotation infrastructures that reflect best practices. 

The Universal Dependencies project (Nivre et al. 2016) is today the largest 
collection of treebanks that have been harmonised in this manner, and Universal 
Dependencies have become the de facto standard within computational linguis- 
tics. The PROIEL Treebank predates Universal Dependencies and uses a different 
annotation scheme, but the PROIEL-style of annotation can be automatically 
converted to Universal Dependencies. The conversion relies on some heuristics 
but work is ongoing to align the PROIEL-style of annotation with Universal 
Dependencies so that these heuristics can be eliminated. 


2.1 The annotation scheme and the annotation process 


The PROIEL-style of annotation is based on multiple levels of annotation. Lemma, 
part of speech and morphological features are annotated at the morphological an- 
notation level. The syntactic annotation level includes labelled dependencies, as 
well as a combination of enhanced (or ‘secondary’) labelled dependencies 
and empty elements for representing syntactic phenomena that involve gaps, 
coindexing or displacement. The information structure level has annotation 
for givenness and anaphoric reference chains. The alignment level contains 
links between elements that are translational equivalents in two texts. 
Finally, the semantic level is used for free classification of data according to 
criteria like aspect or lexical semantics. 

Each annotation level allows for annotation of individual tokens. Some lev- 
els are also defined for larger textual units like sentences or paragraphs, but 
the annotation process itself is designed around sentences as the minimal unit. 
As annotation of a text progresses, each sentence is individually assigned an 
‘annotated’ or ‘reviewed’ status, where ‘annotated’ indicates that the sentence 
has been annotated by the primary annotator and ‘reviewed’ indicates that it 
has also been approved by the secondary annotator. A sentence has to have 
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complete annotation on both the morphological and syntactic levels before it 
can be assigned the ‘annotated’ or ‘reviewed’ status, while the other levels of 
annotation are optional and can be added independently. 

The annotation scheme used on the morphosyntactic level is broadly 
aligned with ‘school grammar’ in the sense that assumptions about morphology 
and syntax are not too different from what would be expected by students who 
have studied the language but not necessarily formal linguistics. The scheme 
by default also tries to adhere to linguistically informed conventions for the lan- 
guage and its philological traditions. For Latin, for example, lemmatising is 
based on the Oxford Latin Dictionary but has been adapted to make the relation- 
ship between headwords and parts of speech more predictable so that each 
lemma in the treebank has one and only one normalized headword form and 
one and only one part of speech. 

Although no linguistic annotation is ever completely theory-independent, 
morphological annotation is generally uncontroversial as philologists and lin- 
guists of different persuasions generally follow the same conventions. Syntactic 
annotation is a different matter with wide-ranging disagreement among re- 
searchers. The syntactic annotation in PROIEL-style treebanks is based on de- 
pendency grammar. Dependency grammar is not well developed as a linguistic 
theory, but the PROIEL-flavour of dependency grammar has been enriched with 
formal devices that can handle syntactic structures like raising and control. The 
implementation of these devices and the specific analyses of structures with 
‘gaps’ or long-distance dependencies is based on Lexical-Functional Grammar 
(Kaplan and Bresnan 1982, Bresnan 2001), whose functional structures were in 
turn influenced by dependency grammar. Grammatical functions like subject 
and object are primitives in Lexical-Functional Grammar and this assumption 
has also been carried over into PROIEL-style dependency grammar along 
with Lexical-Functional Grammar’s criteria for identifying these grammatical 
functions. 

Dependency grammar-based annotation was chosen over an annotation 
scheme rooted in constituency structure in part because of its near-universal 
adoption in current computational work, and in part because it makes it possi- 
ble to annotate free-word-order languages consistently. Haug (2015) discusses 
the latter point in more detail, as well as broader methodological motivations 
and the practical implications of this choice. 

The details of the syntactic annotation scheme and the precise handling of 
specific syntactic structures are complex and well beyond the scope of this 
chapter, which aims to give only a brief overview of the key characteristics of 
the treebanks. For further details on the morphosyntactic annotation scheme 
the reader is directed to the overviews by Haug and Johndal (2008), Haug, 
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Johndal et al. (2009), Haug, Eckhoff et al. (2009) and Eckhoff et al. (2018), 
while the design of the annotation scheme for information structure is de- 
scribed in Haug et al. (2014). 


3 Long-term scalability and maintenance 
challenges 


A number of early design choices contributed to the success of treebanks that 
use the PROIEL-style of annotation. Annotation requires specialist knowledge, 
so it is crucial to be able to recruit students and researchers across the world as 
annotators. This requires a tool that supports distributed annotation and that 
does not have to be installed on the annotator’s computer, as this would have 
required us to provide technical support to annotators. We also needed a tool 
that could be tailored to the evolving annotation scheme and allow us to make 
continuous improvements to the software without disrupting annotators. No 
such tool existed in 2008 when work on the PROIEL Treebank started. We there- 
fore opted to develop our own annotation tool as a web application. 

The use of dependency grammar and the organisation into multiple levels 
of annotation, in which each level is independent and can be conceptualised 
either as a graph with nodes and edges or as pairs of tokens and feature struc- 
tures, allowed for a flexible data model that could be mapped onto standard 
technologies for data representation and storage like XML and relational data- 
bases, and it permitted researchers to work independently, adding other anno- 
tation levels when resources and expertise became available. 

Treating the sentence as the smallest unit that can be annotated and re- 
viewed on its own is also a design decision that has worked well in practice as 
it made it possible to release data in batches, even when texts were not 
completely ready, and to preserve the history of changes in a practical way. 

Finally, the Lexical-Functional Grammar-influenced variety of dependency 
grammar has proven to be easy for annotators with philological training to 
learn and apply consistently. It also allows for some flexibility in designing 
consistent analyses of syntactic structures across languages when there is dis- 
agreement in the linguistic literature on what the correct analysis is. 

Other design choices have in hindsight proven to be suboptimal or have 
blocked progress. The model of having a primary annotator with a secondary 
annotator as a reviewer was put in place to ensure consistency while the anno- 
tation scheme was still being developed, and was subsequently used to ensure 
that the three treebanks were compatible and used formal devices in the same 
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way. This relied on extensive coordination between reviewers and centralised 
training of annotators. This approach worked well when several annotators 
were working intensively on annotating multiple texts in parallel but is not 
cost-effective today when only a few annotators occasionally work on expand- 
ing the treebank. 

The process for developing documentation was not integrated with the anno- 
tation tool itself. Unfortunately, documentation efforts have therefore not kept 
up with annotation and the documentation is neither consolidated nor complete. 

On the technological side, the annotation tool is monolithic, so it is hard to 
break it up or replace components. This makes it challenging to modify it or the 
data model that it uses. This is a particular issue in two areas. First, it has ham- 
pered integration with external automated taggers and parsers, which is neces- 
sary since the tool itself only has built-in support for generating suggestions 
using finite-state transducers or by looking up the annotation that an annotator 
has already chosen for a token with the same surface form. Second, it has 
slowed down efforts to address weaknesses in the data model, which is a partic- 
ular concern as the data model lacks support for sub-token annotation, e.g. an- 
notation of compound words or infixes. 

In combination these challenges now constitute a significant barrier to fur- 
ther expansion of the treebanks and are risk factors when it comes to long-term 
maintenance and accessibility. 


3.1 Syntacticus 


To address the long-term scalability, maintenance and accessibility chal- 
lenges, we launched Syntacticus in 2018. The aims of Syntacticus are (1) to 
increase the visibility, accessibility and discoverability of the PROIEL, ISWOC 
and TOROT treebanks, (2) to develop processes for long-term maintenance, 
(3) to improve the scalability of the annotation process and (4) to provide an 
open infrastructure platform for other researchers working on less-resourced, 
historical languages. These are ambitious aims that will take time to achieve. 
Aim 4, in particular, is a long-term aspiration. Aims 1 and 2, on the other 
hand, are crucial for ensuring that the treebanks remain accessible and reli- 
able. Aim 3, in turn, is a requirement if continued expansion is going to be 
economically feasible. 

Visibility, accessibility and discoverability (aim 1) have been addressed by 
setting up a dedicated website for Syntacticus that pro- 
vides much more direct access to data from the treebanks than before. Crucial 
elements include removing all registration barriers, incorporating elements of 
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the familiar search-engine paradigm in the user interface and making more of 
the treebank data indexable by search engines. We have also included direct 
access to data that have been synthesised from treebank data like dictionary 
resources that are automatically generated from the morphosyntactic annota- 
tion. The Varangian Rus Project (Eckhoff and Berdicevskis 2016) has in turn 
built an Old Russian dictionary with glosses in Russian and English on top of 
the synthesised dictionary for Old Russian. 

At the time of writing much work remains to be done before the Syntacticus 
site is mature and satisfies our requirements, but the process for achieving this 
is well understood and achievable given recent advances in web technology 
and the broad availability of suitable open-source software components. The re- 
mainder of this paper is devoted to discussing how we aim to address annota- 
tion scalability (aim 3), which presents significant challenges for low-resourced 
languages. 


4 Scaling morphosyntactic annotation 


Manual annotation of lemma, part of speech and morphological features is 
time-consuming, error-prone and very tedious for annotators. The practical ex- 
perience from PROIEL, ISWOC and TOROT has shown that annotation speed in- 
creases and the error rate decreases when annotators are provided with some 
automated assistance, such as pre-populated annotation fields that they can 
correct or a list of suggested annotations that they can choose from. The effect 
is positive even when this assistance is very crude and generated using simple 
methods, such as looking up annotations that have already been made earlier 
in the text, ranking them by frequency and serving them to annotators as 
suggestions. 

More sophisticated and higher-accuracy assistance can be provided if we 
use automated taggers, parsers and other techniques in natural-language proc- 
essing (NLP). The difficulty here is that historical languages are, in NLP jargon, 
low-resource languages. This means that the data sets and models that are pre- 
requisites for applying many NLP techniques do not usually exist and have to 
be built largely from scratch. For example, in order to use a statistical part-of- 
speech tagger you would have to train the tagger using a corpus that has al- 
ready been annotated with parts of speech. 

While some required language resources, like part-of-speech-tagged corpora, 
do exist for the most widely studied historical languages, they may not be suit- 
able for the task. It is common for such resources to be too small, or to suffer 
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from inconsistent quality or licensing incompatibility. Even when high-quality, 
freely reusable resources do exist, different design decisions or idiosyncratic 
technological choices can make reuse a complex and time-consuming task. 

This chicken-and-egg problem, posed by having to annotate a substantial 
amount of data before automated methods can be used to assist with the task, 
is a particular challenge if, as is often the case, the annotated corpus itself is 
one deliverable within a larger research project whose primary aim is actually 
to answer linguistic and philological questions. 


4.1 Rule-based tagging and ambiguity 


Lack of suitable resources has been the situation for most of the languages in 
Syntacticus. The approach we have taken when starting the annotation process 
for a language is to rely on a combination of (1) a crude mechanism for looking 
up existing morphosyntactic annotation weighted by frequency, (2) rule-based 
morphological analysers that provide guesses for inflectional forms that have 
not been annotated before, and (3) hand-crafted rules for deriving probable 
syntactic labels from the morphological analysis. 

Our rule-based morphological analysers are written using finite-state mor- 
phology (Beesley and Karttunen 2003), which is a well-understood technique 
for mapping surface inflectional forms to morphological analyses. Writing a 
complete finite-state morphology for a language is a large undertaking, but 
while a finite-state morphology with high coverage may be desirable for other 
purposes, we have found that in practice we only need a finite-state morphol- 
ogy with limited coverage and mainly benefit from it in the initial phase of an- 
notating texts in a new language. 

The finite-state morphology can build on a combination of a manually com- 
piled list of high-frequency function words with analyses, and rules for high- 
frequency inflectional classes. If an electronic lexicon is available, it may be 
possible to combine this with the rules for inflectional classes. If no such lexi- 
con is available or it lacks details of inflectional classes, we can instead use a 
stem guesser that allows us to guess unknown words based on what a likely 
stem is. A stem guesser will, however, over-generate unless accurate rules for 
possible stems and possible combinations of stem and affix can be formulated. 
If the stem guesser over-generates, annotators will be faced with a range of 
nonsense annotation suggestions. Unfortunately, the information in reference 
grammars is in practice not detailed enough to formulate precise constraints, 
even for a well-documented language like Latin, so a phase of experimentation 
is necessary to achieve the right balance. 
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4.2 Statistical approaches and data sparsity 


For every surface form there may be a number of possible analyses. A finite- 
state morphology, or any other method that simply looks at a single inflectional 
form in isolation and maps it to possible analyses, cannot on its own disambig- 
uate them. For Syntacticus this means that when the finite-state morphology is 
applied annotators are served a list of tuples of lemma, part of speech and the 
ten morphological features of the PROIEL annotation scheme. Depending on 
the properties of the language, the list of suggestions can be very long and it 
can require time and concentration for the annotator to pick the right combina- 
tion, especially since many candidate analyses differ only in one morphological 
feature. 

Because of the potential for multiple possible analyses to degrade annota- 
tion speed and accuracy, it is paramount to perform disambiguation. As the an- 
notated part of the corpus grows, this becomes possible using statistical NLP 
techniques. 

The canonical method for statistical part-of-speech or morphological tag- 
ging uses supervised machine learning. In supervised machine learning the 
system is given a training set which consists of an input with features and their 
correct labels. In this case the features are the surface word forms to be tagged, 
or parts of those word forms, and the labels are the parts of speech, lemmas or 
morphological features. Then, using a machine-learning algorithm, the system 
produces a classifier that can assign labels to new inputs. In other words, the 
system is given the correct answers for part of the data and then uses this to 
infer a model that can generalize to unseen data. 

For historical reasons, our existing toolchain only allows off-line tagging. 
This means that statistical tagging is done as a separate batch operation in 
which the morphological annotation level is populated with the output of the 
tagger. This takes place before annotators start their work and may have to be 
repeated at regular intervals as annotators correct suggestions from the tagger. 
Our experiments with this ‘pre-annotation’, which have mostly used the TnT 
tagger (Brants 2000), show a significant positive effect on annotator perfor- 
mance (Skjaerholt 2011). The process is, however, very inflexible and does not 
allow us to make full use of modern taggers. Instead, automated tagging should 
be done on-the-fly as annotators work on the text. We also need to understand 
better what affects accuracy so that we can tune parameters and build a pipe- 
line that does text normalisation when this has a positive effect on accuracy. 

The key challenge when applying statistical techniques to a historical lan- 
guage is data sparsity. Morphological complexity is one contributing factor as it 
makes each individual inflectional form less frequent than in less morphologically 
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complex languages, and for morphologically more complex languages the 
models that perform well are not necessarily the same as those that perform 
well for morphologically less complex languages. Despite this, reasonable re- 
sults can be achieved. As an illustration, Celano, Crane and Majidi (2016) re- 
port 88% average accuracy for Ancient Greek part-of-speech tagging. 

Another reason for data sparsity is lack of standardised orthography. The 
level of standardisation differs significantly between historical languages and 
is a complex issue that spans the degree of variation in manuscripts, the philo- 
logical traditions and conventions of published editions for a particular histori- 
cal language. For texts, whose orthography shows significant variation, 
normalisation of the text before training a model and before tagging may signif- 
icantly improve results. Berdicevskis, Eckhoff and Gavrilova (2016) report for a 
Slavic corpus that tagging accuracy improved significantly with text normalisa- 
tion (89.5% accuracy for POS-tagging and 81.5% for a ten-feature morphology). 

As a rule of thumb, a small training set will lead to low accuracy, and his- 
torical corpora in general tend to be small. While there are large corpora for 
historical languages like the Bibliotheca Teubneriana Latina, which contains 
around 13 million words of Latin, we have to keep in mind that such collections 
cover a multitude of genres, a range of registers and sociolinguistic variation 
and, importantly, texts produced at very different times. The effect of such 
intra-corpus variation on accuracy is not clear. Birnbaum and Eckhoff (2018) 
find that for tagging Byzantine Greek results improve when the tagger is trained 
on a corpus that contains a combination of Ancient Greek, Koine and Byzantine 
Greek (91.3% accuracy for POS-tagging and 94.0% accuracy for ten-feature mor- 
phological tagging), despite the internal variation within this corpus. On the 
other hand, Adesam and Bouma (2016), in work on Old Swedish, show that 
when a tagger is trained on one section of a text and then used to tag another 
section of the same text, accuracy is very high (94.2% average accuracy for 
POS-tagging and 83.2% for POS and morphology), but when it is used to tag a 
different text with similar properties the results are vastly inferior even when 
various forms of text normalisation are applied (69.9% for POS-tagging and 
49.0% for combined POS and morphology). 

Existing work on historical languages has mainly focused on part-of-speech 
tagging, morphology and lemmatising, but we know from work on other lan- 
guages that the challenges for dependency parsing are similar. The conventional 
approach is to perform dependency parsing as a separate step after tagging. As 
an illustration of the range of accuracy that can be achieved in this way, results 
from a large-scale experiment that included the Universal Dependencies version 
of the PROIEL Treebank were in the range of 80% unlabelled accuracy and 75% 
labelled accuracy for Latin and Ancient Greek (Alberti et al. 2017). It is interesting 
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to note, however, that current state-of-the-art parsing approaches that have been 
designed to work with raw text as input, and which have been trained using data 
from many languages, perform worse on historical languages than on living lan- 
guages primarily because it is difficult for them to determine sentence bound- 
aries from the inconsistent punctuation and spelling conventions in historical 
data (Zeman et al. 2017). 


5 Conclusion and future work 


In this chapter we have described some scalability challenges that the PROIEL, 
ISWOC and TOROT treebanks face, discussed the motivation for setting up 
Syntacticus and reviewed some early results from relevant studies on auto- 
mated tagging and parsing of the historical languages that Syntacticus covers. 

Our work on scaling the annotation process with taggers and parsers is in 
its early phase, and while the studies on automated techniques reviewed here 
show some promising results, it is not certain that these techniques will lead to 
actual improvements in annotation speed and accuracy. Before we can deter- 
mine this we have to integrate these tools with our existing annotation tool- 
chain and conduct experiments with our online annotation tool in realistic 
online annotation scenarios. 

Work on Syntacticus as a platform is also still in its early phase. In particu- 
lar, we have to scale the annotation process to new languages. This may require 
applying more specialised techniques, such as annotation projection, an ap- 
proach in which existing annotation in one language is mapped onto another 
language. Sukhareva et al. (2017) have recently demonstrated that this can be 
used successfully to induce a part-of-speech tagger for Hittite using Hittite texts 
that had been aligned with German texts. 

Another unsolved issue for new languages concerns the interaction with to- 
kenisation. For annotating Sanskrit, for example, it is necessary to do tokenisa- 
tion and tagging at the same time since Sanskrit texts show the surface forms 
that result after the application of sandhi, which often removes the word bound- 
aries that taggers and parsers tend to assume are present. Moreover, work by 
Inglese, Molina and Eckhoff (2018) show that the data model itself may need to 
be revised substantially to support sub-token annotation for languages like 
Hittite, where the relationship between text and annotatable unit is particularly 
complex. 


Marieke Meelen 


2 Annotating Middle Welsh: POS tagging 
and chunk-parsing a corpus of native 
prose 


1 Introduction 


For a study on syntactic changes in Middle Welsh (see Meelen 2016), it was desir- 
able to compile a searchable corpus of Middle Welsh, at least a partial one includ- 
ing the most important narrative literature from the medieval period. This chapter 
presents this first annotated corpus of historical Welsh. For the present study, a 
selection of Middle Welsh texts was used, based on their popularity among Welsh 
philological and linguistic researchers (the tales of the Mabinogion) and a less- 
researched excerpt of a law text (the Laws of Women) to compare the results in a 
different genre. This selection forms a first step towards the creation of a 
much larger, and well-balanced, Parsed Historical Corpus of the Welsh Language 
(PARSHCWL, see Meelen and Willis forthc.). 

The White Book of Rhydderch and the Red Book of Hergest manuscripts, both 
dating from the 14th century, contain the most famous collection of Middle Welsh 
native! literature: the Mabinogion. These tales (of unknown authorship) derive in 
part from an oral literary tradition. They are thought to have their origin in the 
early medieval period but were only put down in writing several centuries later 
(see, among others, Davies 1998). For the present chapter, all extant tales of the 
Mabinogion (11 in total) were annotated, representing the narrative prose of the 
Middle Welsh period of the language, c. 1150-1500 AD. High-definition photo- 
graphs of both of these manuscripts are available online via the websites of the 
National Library of Wales — White Book manuscript Peniarth 4-5) 
and Jesus College, University of Oxford — Red 


Book manuscript Jesus College 111). The earliest text for the present corpus is an 
excerpt from a law text mainly concerned with the rights of women. This excerpt 
of the Early Welsh laws is taken from the BL manuscript Add. 22356 (S), one of 
the most important manuscripts in the tradition of the Welsh Law of Hywel. 
This Law of Hywel was the system of law in use in medieval Wales, based on an 


1 The term ‘native’ refers to the texts that are assumed to be originally composed in Middle 
Welsh, as opposed to ‘translated’ texts that are translations from other languages, such as 
chronicles translated from Latin. 
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older custom system. It is named after Hywel Dda (‘Hywel the Good’), a 10th- 
century Welsh king. The British Library Law manuscript Add. 22356 (S), how- 
ever, is from the mid-15th century. As with all Middle Welsh law texts, al- 
though the manuscripts are mostly dated from the late medieval period, 
(parts of) the texts go back many centuries (Davies 1966). The latest edition is 
accessible online via fwww.cyfraith-hywel.org.uk. 

Although many of these manuscripts are available as digital photographs, not 
all of them have been converted to (online) editions with search options that facili- 
tate philological and linguistic research. A large collection of Middle Welsh texts, 
including the texts of the Mabinogion included in the present study, is now avail- 
able online through the Welsh Prose project 
fac.uk), but the search function is limited, and the texts are not annotated in any 
way. Searchable corpora are indispensable tools in historical linguistic research, 
both from a quantitative and qualitative perspective. Qualitative corpus research, 
for example, comprises investigations of the distribution of the different forms and 
constructions that are attested in a wide variety of texts. When analysing large 
amounts of texts or corpora in this way, linguists need to be extremely consistent 
in their approach to get meaningful and testable results. Categorising and labelling 
forms or structures in large amounts of data can be prone to error, because even 
the most careful researcher can change their ideas about the features and charac- 
teristics they use to disambiguate categories as they are confronted with more and 
more material. As the number of texts in need of investigation grows, it is no lon- 
ger feasible to simply read and make notes. A further disadvantage of manual 
notes is that the results are much harder to verify and replicate, which is problem- 
atic in quantitative studies in particular. But qualitative studies can also benefit 
from searchable (annotated) corpora, as with improved computational methods 
specific or rare forms under investigation (for the purposes of philological research 
or comparative reconstruction, for example) are much easier to find. Therefore, it 
is useful to employ methods from the field of Natural Language Processing (NLP) 
and the tools created by computational linguists to build annotated searchable cor- 
pora. Because of their computational nature, these tools are designed to consis- 
tently deal with large amounts of data in a very short period of time. The results 
are consistent, following strict rules that are well-described in the annotation man- 
uals (see Meelen and Willis forthc.), and can then be made readily available for 
any (Welsh) linguist. 

Having said this, however, as an inflected language without a standardised 
orthography, Middle Welsh poses some specific challenges for ready-built NLP 
tools like part-of-speech (POS) tagging and parsing algorithms that automati- 
cally add morphosyntactic information (see Sections 3 and 4). Initial consonant 
mutation, for example, can yield a word like pawb 'everyone' in three different 
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ways: pawb ‘everyone’ (no mutation), a phawb ‘with everyone’ (aspirate muta- 
tion) and i bawb ‘to everyone’ (soft mutation). Furthermore, with five tenses 
(Present, Past, Pluperfect, Present Subjunctive and Conditional/Imperfect), seven 
different person-number-gender suffixes, various sets of pronouns and clitics 
and a wide range of functional particles, Middle Welsh with its rich morphology 
and extensive orthographical variation is far more difficult to automatically an- 
notate than a morphologically poor and orthographically standardised language 
like Present-Day English. 

This chapter presents the first systematic approach to annotating historical 
Welsh, ultimately aiming to provide an excellent starting point to build a fully an- 
notated Welsh historical corpus (see also Meelen et al. 2017 and Meelen and Willis 
forthc.). This chapter discusses some of the challenges faced developing the meth- 
odology and annotating the first part of this corpus. Some of these challenges are 
specific to Welsh (or other Celtic languages), others inherent to working with his- 
torical corpora in general. Section 2 gives a brief overview of how the corpus was 
compiled and pre-processed. In Section 3, I discuss the procedure of part-of- 
speech (POS) tagging and of developing a tag set to annotate Welsh (and poten- 
tially other languages) with rich verbal and prepositional inflection. Sections 4 
and 5 shed light on adding syntactic and information-structural features to the 
data so that it can be queried in various ways. In the final section, I demonstrate 
how this first annotated corpus of Middle Welsh prose can be extended and that 
this entails promising opportunities for future research. 


2 Pre-processing 
2.1 General philosophy and goals 


Ideally, any corpus should be well-balanced in terms of text genre, length, origin 
etc. When working with historical data, however, the choices are often limited, 
and creating detailed annotation is extremely time-consuming. Therefore, this first 
annotated historical corpus of Middle Welsh focussed on the most commonly used 
editions? of the 11 native tales of the Mabinogion and some excerpts from the laws 
only. Future extension of the corpus will include alternative manuscript versions 


2 The editions used for this first annotated corpus are Williams (1951) for the Four Branches, 
Bromwich and Evans (1992) for Culhwch and Olwen, Thomson (1997) for Gereint, Thomson 
(1968) for Owein, Goetinck (1976) for Peredur, Roberts (1975) for Lludd and Llefelys, Williams 
(1908) for Breuddwyd Maxen, Richards (1948) for Breudwyt Ronabwy. For the new and complete 
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of each of these texts. In addition to that, the corpus will be extended to in- 
clude more texts from different genres and registers such as the historical 
chronicles of the kings and princes and translated texts such as chronicles or 
Bible translations. 

The main aim of this project was not to give a complete syntactic analysis or 
to provide a detailed parsed structure. The part-of-speech (POS) tags contain 
highly detailed morphological information (see Section 3), but the phrasal annota- 
tion is confined to a hierarchal chunk parse (see Section 4). In this way, the anno- 
tated corpus could remain theory-neutral. At the same time, queries for linear 
order and basic hierarchical phrase structure (e.g. to find noun phrases within 
prepositional phrases) are still possible. Finally, future enrichment of the chunk- 
parsed corpus is enabled, because of its flexible XML format (see Section 5). 

Any controversial decisions were avoided as much as possible by backing off 
to a simpler form of annotation. For example, interjections were simply labelled 
as ‘INTJ’ instead of aiming to classify them further. Similarly, constructions that 
are changing over time were consistently annotated to facilitate future studies. A 
good example is the sef-construction? in Welsh. The information-structural status 
of this construction changes from initial identificatory focus to plain predicate 
focus in the course of the Middle Welsh period (see Meelen 2016: 272-284). Since 
most texts are difficult to date exactly, throughout the corpus the specific tag SEF 
for any occurrence of this type of sentence was used (see also Section 3 on POS 
tagging). In this way, all sentences with the sef-construction can be easily found 
and examined in context. 


2.2 Preparation and text formatting 


There are various orthographical inconsistencies in the various versions of the 
texts of the Mabinogion (cf. Thomson 1986: xi), e.g. yniueroed~y niueroed ‘num- 
bers’, mywn~mewn ‘in’, etc. For the present study, the texts were not extensively 
pre-processed through text normalisation or orthographical regularisation, be- 
cause there was no stemmer available yet for Middle or Early Modern Welsh. 
A stemmer can automatically detect the root or stem of inflected and conjugated 


Parsed Historical Corpus of the Welsh Language (PARSHCWL), we will create new editions so 
that all annotated texts can be deposited and made freely available to any scholars. 

3 This construction originated from a reduced identificatory cleft construction ys ef ‘it is it’, 
but developed in the course of the Old and Middle Welsh period into an adverb sef meaning 
‘namely’. 
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forms, which is extremely useful when automatically processing a new text ina 
particular language. With a stemmer the task of POS tagging is easier because 
the various forms are reduced to stems + their respective inflection/conjugation. 
Developing a good stemmer or carrying out the procedure manually during the 
pre-processing stage is a tremendous task, however, which is why the detection 
of inflected forms was left to the POS-tagging stage, where the automatic classifi- 
cation algorithm could use specific features to disambiguate words regardless of 
their varying orthography (see Section 3). 

In order to prepare the texts for annotation, a minimum amount of prepara- 
tion is always necessary, however. First of all, the markup was stripped from 
the digitised texts, which were then saved as plain text files (.txt) so that they 
would be in the right input format for the POS tagger. Further pre-processing 
involved the insertion of sentence-final punctuation (if that was not present in 
the edition already) and the deletion of sentence-internal full stops. Finally, ut- 
terance boundaries in the form of <utt> were inserted semi-automatically (auto- 
matically after a full stop and manually if the full stop did not exist). Without 
utterance boundaries, the POS tagger is not able to assign morphosyntactic 
tags to all tokens. In addition, utterance boundaries are useful units for subse- 
quent NLP tasks like chunking, chinking* and full syntactic parses. The only 
punctuation marks that were removed were the full stops preceding and follow- 
ing numbers, e.g. ‘.12.’ was turned into ‘12’ to optimise automatic token recogni- 
tion. Tokenisation (the isolation of words) was done automatically by the POS 
tagger on the basis of word spacing and full stops at the end of an utterance. 

The Natural Language Toolkit (NLTK, regular expression 
chunk-parser (see Section 4) requires a list of words and tags. Therefore, the 
POS-tagged text files of the format token/TAG were converted to the right input 
format for the chunk-parser using a simple text conversion script designed es- 
pecially for this purpose (see Meelen 2016: 326 for a sample of the code and fur- 
ther details of the chunking process). 


2.3 Splitting and joining tokens 
Meelen and Beekhuizen (2013) in their pilot study observed that the huge amount 


of orthographic variation in Middle Welsh complicates the POS-tagging task tre- 
mendously. The advantage of using a Memory-Based Tagger (MBT) is that this 


4 Chunking is defining a set of words that should be grouped into phrases; chinking is defin- 
ing the set of words that should be excluded from those phrases. 
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type of tagger could filter those out on the basis of the context most of the time 
(see Section 3). Splitting and joining certain tokens still had to be done, however. 
When two words are merged together without spaces in the orthography, e.g. 
ymangor (yn + Bangor) ‘in Bangor’, the resulting combination contains a preposi- 
tion ‘in’ and place name ‘Bangor’, but only receives one POS tag. If words like 
these are not split, it is difficult to decide which morphosyntactic tag this should 
be (preposition or place name). Joint tags like ‘P-NPR’ (‘preposition + proper 
noun’) could in theory be created, but the larger the amount of POS tags, the 
more difficult it will be for the algorithm to classify words correctly. Also, a re- 
searcher interested in place names would need the opportunity to automatically 
extract just place names from the corpus and would not like to be confronted with 
extra work sifting through examples of place names combined with prepositions 
(in case of combined tags) or missing examples altogether (if only the preposition 
tag was used). Some tokens, however, were particularly challenging for the auto- 
mated tagger, since very few generalisations could be made from the small train- 
ing set (see Meelen and Beekhuizen 2013). To overcome some of those very 
specific orthographic challenges, the following combinations were automatically 
split using so-called regular expression replacements. Regular expressions are se- 
quences of characters that define search patterns. They allow for more detailed 
searches (and therefore quicker semi-automatic replacements) than simple string 
searches, because they can include logical operators. In this way, fixed combina- 
tions with prepositions that cause nasal mutation like yn ‘in’, e.g. ymwyt > yf? 
mwyt ‘in food’, conjunctions combined with definite articles: ar > a# + r ‘and the’ 
and particles combined with pronouns, e.g. ae > a# + e ‘PCL PRO-A’ (particle + 
accusative pronoun). 


3 POS tagging 


A properly pre-processed version can be tagged automatically by a part-of- 
speech tagger. Although for this first attempt to create an annotated corpus pre- 
processing tasks were minimal, the tokenisation (splitting and combining words) 
alongside the insertion of sentence boundaries was enough to make a memory- 
based algorithm perform well. For Middle Welsh, there was no Part-of-Speech 
tagger available yet. I therefore generated an MBT that could subsequently be 
used to assign morphosyntactic tags to the Middle Welsh data. For this purpose, 


5 The £ sign was inserted to indicate word breaks so that researchers are still able to identify 
the original orthography. 
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decisions have to be made concerning the tag set, the list of morphosyntactic la- 
bels for each of the Middle Welsh words. A very detailed tag set facilitates more 
(and different types of) research. When working with a corpus of limited size, 
however, use of too many different tags leads to low frequencies and many ha- 
paxes, which in turn complicates the automatic tagging task and yields degraded 
results. In this section I describe these challenges and furthermore offer some 
solutions that are not only useful for those working on Middle Welsh, but for any- 
one working with similarly complex historical data. 


3.1 Establishing the morphosyntactic tag set 


The rich morphology, the initial consonant mutations and the abundant ortho- 
graphical variations of Middle Welsh pose significant problems for any auto- 
mated task in Natural Language Processing. Extensive pre-processing including 
normalisation of spelling and mutations, for example, would simplify the POS 
tagging since the tagger would then be able to recognise a larger number of 
words. Apart from the fact that such regularisations are time-consuming, it en- 
tails a type of editorial intervention that has major implications for future re- 
search. When orthography is regularised, for example, editorial decisions have to 
be made concerning what this ‘regular’ spelling should be in the first place. It 
also does not allow for dialectal variation or differences in scribal practices that 
might give crucial insights in the linguistic history and geography at a given 
time. Another way to simplify the tagging task is to limit the tag set to a short list 
of broad morphosyntactic categories like ‘VERB’, ‘NOUN’, etc. However, this too 
limits the range of research opportunities tremendously. It is therefore worth- 
while to develop an annotation scheme that gives as much morphosyntactic in- 
formation as possible. 

A commonly used annotation scheme adding morphosyntactic information 
to historical corpora is the ‘UPenn standard’ developed initially for Old and 
Middle English texts (see |www.Ting.upenn.edu/histcorporg). This annotation 
scheme, however, does not always provide enough information to answer cer- 
tain research questions, mainly queries concerning agreement patterns and 
changes in information structure. To enable further research in these and other 
areas, I have used the standardised UPenn scheme, but extended the part-of- 
speech tag set where necessary. Starting from the already extended tag set used 
for the Icelandic corpus (cf. Wallenberg et al. 2011), I have examined the fea- 
tures of Middle Welsh grammar and systematically added extra features, i.e. 
more inflectional features such as person and number added to the root and 
tense/mood forms with a dash/hyphen. 
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3.1.1 Verbal tags 


Verbal inflection in Welsh mainly occurs as a suffix to the verbal stem. 
Inflected verbs in the UPenn tag set are tagged VB. Past tense is indicated by 
the regular English past-tense ending in -ed, resulting in VBD. For Welsh, the 
VBD for the preterite tense was kept and in the same way tags for present (-P), 
future (-F) (only relevant for irregular verbs) and pluperfect (-G, for Welsh gor- 
berffaith ‘pluperfect’), imperative (-I) and imperfect (-A, for Welsh amherffaith 
‘imperfect’) etc. were added. Finally, a distinction was made between indicative 
(-D or subjunctive (5) mood for the present and imperfect tenses.° This results in 
insightful systematic combinations like VBPI (present indicative), VBAI (imper- 
fect indicative), VBG (pluperfect) etc. The same letters were systematically added 
to irregular verbs, resulting in for example DOPI (present indicative of the verb 
gwneuthur ‘to do’), GTI’ (imperative of the verb cael ‘to get’) or BEAS (imperfect 
subjunctive of the verb bod ‘to be’). 

Apart from these more-detailed tense-aspect-mood markers, further informa- 
tion was added about the inflection to indicate person and number. Following 
standard glossing practices, person and number were represented as -1SG (first- 
person singular), -2PL (second-person plural) etc. Welsh has a further inflectional 
suffix for the ‘impersonal’ form of the verb that can be used in true impersonal 
contexts meaning ‘one’ or underspecified ‘they’; it is frequently translated into 
English as a passive. The number 4 was employed for this specific suffix and 
added to the verbal tags like the other personal endings, e.g. VBPI-4 (impersonal 
present indicative) or DOAI-4 (impersonal imperfect indicative of the verb gwneu- 
thur ‘to do’). 


3.1.2 Inflected and combined prepositions 
Another feature of the grammar, specific to Welsh and other Celtic languages 


(but also seen in for example Semitic languages like Arabic or Hebrew), is in- 
flected prepositions. Middle Welsh had a specific set of prepositions that could 


6 Note that the ‘-I’ for “indicative” can only appear combined with ‘-P-’ (present) or ‘-A-’ (im- 
perfect); the ‘-I’ for ‘imperative’ only appears directly after the core verbal tag (VB/GT/BE/DO). 
Since the imperative does not have different tenses this does not lead to any ambiguities in the 
annotation system. 

7 The initial ‘HV-’ is for the auxiliary verb have in the UPenn standard annotation scheme. In 
Welsh, cael can mean ‘have, get’ with auxiliary functions as well, although it is not the exact 
equivalent of English have, which is why GT ‘get’ was chosen for Welsh cael. 
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be inflected for person, number and gender (the last of these in the third- 
person singular only and only ever with a pronominal object). There are also 
*uninflected' prepositions in Welsh, but the inflected set includes very common 
prepositions like i ‘to’, ar ‘on’ and yn ‘in’. Following the verbal inflectional tags, 
Middle Welsh iddi ‘to her’ for example was tagged as P-3SGF ‘preposition third- 
person singular feminine’. 

Some prepositions in Welsh could also be combined with other preposi- 
tions, e.g. y dan ‘under, below’ from y ‘to’ + tan ‘under’. These complex preposi- 
tions were tagged P + PX, so they could be recognised as separate, but also as 
combined prepositions. A further advantage of this is that the automatic tagger 
looking at the tags preceding and following the focus word, will not encounter 
the rare sequence of two prepositions. A disadvantage remains, of course, that 
the tag set is further extended and there are more homophonous forms that 
could render worse results if the complex preposition in question does not 
occur frequently in the training set. 

Welsh also allows for some further types of complex prepositions: a combi- 
nation of a preposition plus a grammaticalised noun. If the object of this type of 
preposition is a pronoun, it can appear in between the two prepositions as a pos- 
sessive pronoun, e.g. yn eu herbyn ‘against/towards them’ from yn ‘in’ plus eu 
‘their’ plus erbyn ‘opposition’. There are two possible ways to annotate construc- 
tions that are changing in historical corpora: we can annotate the original struc- 
ture and form or the new construction as a whole. Since the exact date of 
grammaticalisation is often difficult to determine, it is not always easy to choose 
one or the other. As long as the construction is tagged consistently in one text (or 
one period of the historical corpus) and the annotation manual is clear about 
this, this should not be a problem. In that case, future researchers will always be 
able to find and, if necessary, change the annotation again. In this particular 
case of combined prepositions, a less conservative annotation scheme, disregard- 
ing the nominal origin of the construction yielding the tag sequence ‘P PRO-G 
PX’ (preposition — possessive pronoun - second part of combined preposition) 
was preferred to facilitate research into prepositional phrases. 


3.1.3 Distinguishing different types of pronominal forms 


Another part of grammar in which the tag set was extended significantly is 
pronominal forms. Since Welsh has various sets of pronouns for different 
(grammatical) contexts, a more fine-grained distinction here could enhance re- 
search not only in the pronominal domain, but also in Information Structure. 
Conjunctive pronouns like ynteu ‘he (then)’, for example, are used in contexts of 
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topic switch, meaning ‘but I’, ‘I, then,’ etc. Reduplicated pronouns like tydi ‘you’, 
on the other hand, are only used in focussed contexts. Separate tags for those are 
thus useful for finding the focus domain of sentences. 

A further distinction in the pronominal domain was made between posses- 
sive pronouns and object pronouns. Since the infixed versions of these pro- 
nouns often exhibit the exact same form, a more fine-grained distinction in the 
tag set facilitates syntactic research here as well. Following the extensions of 
the tag set for the Icelandic parsed corpus (see Wallenberg et al. 2011), these 
pronominal tags receive case tags marked with a dash, for example, fy ‘my’ 
PRO-G (‘pronoun genitive’), or e ‘him’ PRO-A (‘pronoun accusative’). 


3.1.4 Additional extensions of the tag set 


Further extensions of the tag set include ADJQ for equative constructions, e.g. 
cochet ‘as red’ (from coch ‘red’ + equative -et) and ADJPL for plural adjectives, 
e.g. gueisson ieueinc ‘young servants’. More detailed tags like these are helpful 
to historical linguists and syntacticians looking at the structure and agreement 
patterns of noun phrases. 

As described above, Welsh employs a wide range of particles. These too 
were tagged separately according to their function (e.g. PCL-QU ‘question parti- 
cle’, PCL-FOC ‘focus particle’, PCL-NEG ‘negative particle’) to help distinguish 
different types of clauses. Aspectual particles like yn ‘progressive’ (PROGR) or 
wedi ‘perfective’ (PERF) were also distinguished from the homophonous predi- 
cative particles (PRED) and prepositions (P) respectively. 

The verbal noun category characteristic of Celtic languages was tagged VN 
for regular verbs. Irregular verbs with verbal nouns that have specific functions 
in Welsh, e.g. cael ‘get’, also used for the passive, received specific verbal noun 
tags. The -N was added systematically to their base forms, e.g. GT- ‘have, get’ > 
GTN ‘verbal noun of the verb cael ‘to get’’. The verbal noun of the verb ‘to be’ 
was tagged as ‘BEN’, although it can also appear in this form in many other 
syntactic contexts, e.g. in complement clauses. 

Finally, some additional lexical items with specific functions were tagged 
separately. An example of this is the above-mentioned petrified form sef 
(tagged SEF), which was used in earlier stages of the language to focus identi- 
ficatory copular sentences. During the Middle Welsh period, it grammatical- 
ised further until it became an adverbial element used in apposition to noun 
phrases meaning ‘that is’ (cf. Latin id est still used as the abbreviation i.e. in 
English). 
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One final problem that remained for Middle Welsh was the large amount of 
homophony alluded to in various cases above already. Because there is a large 
number of very short words in Middle Welsh that are spelled the same with a 
wide range of meanings, e.g. a which can mean ‘and’, ‘with’, preverbal particle, 
etc. this poses problems for automatic classification algorithms. The tagger, how- 
ever, was often able to distinguish between up to five different possible meanings 
of, for example, Middle Welsh y ‘the, his, her, to, to his/her, in’ etc. on the basis 
of the preceding and following context. 


3.2 Morphosyntactic annotation 


With the MBT from TiMBI? it is possible to generate a tagger for any language 
based on a training set consisting of a systematic token/TAG format with ut- 
terance boundaries at the end of every sentential unit. A memory-based tagger 
uses memory-based algorithms to disambiguate and classify words in a corpus. 
Unlike other types of taggers, memory-based taggers interweave processing and 
learning stages. Whenever a new language item + classifier is encountered in 
the training data, it leaves a memory trace that guides subsequent processing. 
This means that when a new instance is found and needs to be classified, a 
set of relevant instances is selected from memory with a number of useful fea- 
tures, and the new token is classified by analogy to that set. This therefore 
yields robust results especially when POS tagging languages with orthograph- 
ical variation and rich morphology (see Zavrel and Daelemans 1997, and 
Daelemans and Van den Bosch 2005 for instructions, background and further 
functionalities).? 


3.2.1 Automatic POS tagging 


Once the Middle Welsh tagger is generated, the settings file of the tagger is 
used to assign POS tags to a new part of the corpus (presented as a tokenised 
text file). Based on the training set, the MBT divides the new text in need of 
annotation into ‘known’ and ‘unknown’ words. Depending on the exact 


8 

9 For the new version of PARSHCWL, the results of this Memory-Based Tagger will be com- 
pared to a state-of-the-art BILSTM-CNN-CRF tagger (see 
to see which of those yields better results and should form the basis of texts that need 
to be annotated and added to PARSHCWL in the future (see Meelen and Willis forthc.). 
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parameter settings and the features from the training set stored in memory, the 
tagger will then assign a tag to each word. 

In Welsh the inflection appears as a suffix (on verbs or prepositions). When 
the tagger finds an unknown word like ohonaf ‘of me’, for example, it can com- 
pare the last three characters to known words with assigned tags in the training 
set. An example of this could be another inflected preposition, like arnaf ‘on me’ 
with the POS tag P-1SG (‘preposition + first person singular ending’). The exact 
same final characters (in combination with the other tags in the preceding and 
following context) lead the MBT to assign the same tag P-1SG to ohonaf, which 
would be the correct tag. Note that the context of inflected verbs and prepositions 
is often quite distinct (e.g. prepositions can appear after inflected verbs whereas 
inflected verbs do not). Since the automatic tagger is also sensitive to context, it 
will be able to distinguish verbs from prepositions and therefore verbs ending in 
-naf, e.g. canaf ‘I sing’, would be correctly tagged as first-person singular present 
verbs (VBPI-1SG) instead of inflected prepositions (P-1SG).’° 

Known words are easier if there are no homophones with different tags. 
If there are homophones, for example the above-mentioned Middle Welsh 
word y, the context in which they appear is crucial. In between an adverb 
(ADV) and an inflected verb (VB*/GT*/BE*/DO*), y is undoubtedly the pre- 
verbal particle following sentence-initial adjuncts, like in (1a). In front of 
verbal nouns, however, like at the end of (1b), y could be the preposition ‘to’ 
or a possessive pronoun (masculine, feminine or third-person plural), as ear- 
lier in (1b). 


(1) a. Tranhoeth y deuthant y ir llys. 
next.day PTC  COMesp.prer tO DEF court 
‘The next day they came to the court.’ (Bromwich and Evans 1992: line 
595, [Culhwch ac Olwen]) 


b. a dyuot yn y uryt ac yn 
and comey, in 3SGmasc.poss Mind and in 
y uedwl uynet y hela 
3SGmasc.poss Mind gOyy to huntyy 
‘and he was minded to go and hunt’ (Williams 1951: lines 3-4 [Pwyll 
Pendeuic Dyuet]) 


10 Many thanks to an anonymous reviewer who suggested explaining this potentially ambigu- 
ous case. 


2 Annotating Middle Welsh —— 39 


The output file of the tagging process is a text file consisting of a word + TAG 
(shown in Figure 1) and an indication whether this word was known (signalled by 
a single forward slash “/”) or unknown (signalled by a double forward slash “//”) 
from the training set. 


Kilyd//NPR mab/N Kyledon//NPR Wledic//NPR a/PCL uynnei/VBAI- 
3SG wreic/N kynmwyd/ADJQ ac/P ef/PRO ./PUNC <utt> 

Sef/SEF gwreic/N a/PCL uynnwys/VBD-3SG ,/PUNC Goleudyt//NPR 
merch/N Anlawd//NPR Wledic//NPR ./PUNC «utt» 


Figure 1: Fragment of the output of the POS tagger for the text Culhwch and Olwen. 


MBT allows for different settings according to the features of the words them- 
selves or the context in which they appear. In order to obtain the maximally 
reliable tags, a wide range of parameter settings was tried, varying those fea- 
tures. The optimal settings for Middle Welsh known (-p) and unknown (-P) 
words are the following (see Meelen 2016: 331-332 and Daelemans et al. 2010 
for further details on these specific settings): 

— pdfa 

— PsssdFawchn 

- M200-n5-965-O +vS -F Columns-G K: -a 0 U: -a 0 -m M -k 17 -d IL 


For Middle Welsh, the corrected gold standard of one text was subsequently 
used to annotate other texts of the Mabinogion and the laws automatically with 
greater accuracy. Each of those texts was in turn manually corrected as well. 


3.2.2 POS tagging results 


In order to estimate the quality of the POS tagger and obtain optimal parameter 
settings, I evaluated the manually annotated data with a ten-fold cross-validation, 
i.e. taking 90% of the data, training the model on that subset and then testing 
it on the other 10%, repeating this procedure for ten 90%~10% splits. Since the 
ten percent that the model is tested on is manually checked, we can see how often 
the model assigns the correct tag to a word, as well as obtain insightful statistics 
about the over- and under-generalisations of some tags. The above-mentioned set- 
tings gave the following results for the 59,000-word Middle Welsh corpus (see 
Meelen 2016: 40-44 for a full overview and further discussion of the results): 


40 — Marieke Meelen 


- Global Accuracy: 90.4% 
- Global Accuracy known words: 93.3% 
- Global Accuracy unknown words: 63.3% 


To give a better insight in the success of the tagger, I calculated the Precision (per- 
centage of system-provided tags that were correct), Recall (percentage of tags in 
the input that were correctly identified by the system) and F-score (weighted har- 
monic mean of Recall and Precision). We find high results for simple tags like 
CONJ ‘conjunction’ or N ‘noun’ that occur extremely often. As expected, Precision 
and Recall for tags occurring only once or twice is extremely low. These tags are 
often forms of verbs that occur very infrequently with irregular endings. Precision 
and Recall give more insight in the degree to which the model over- or under- 
generalises certain tags for the individual categories. The genitive (possessive) pro- 
noun category (PRO-G), for instance, is correct about 9096 of the cases where it is 
applied (9096 Precision), but out of all actual possessive pronouns, only 6596 is 
recognised (Recall of 6596). This is understandable, because the possessive pro- 
noun usually consists of only one letter that is homophonous with the object in- 
fixed pronoun. The model thus under-generalised that category in particular. If 
95% of the actual conjunctions on the other hand are recognised as such, while 
the item is only classified as a conjunction correctly in 9096 of the cases, this cate- 
gory would be slightly over-generalised. As expected, the F-score for frequently oc- 
curring tags is considerably higher than that for tags and tokens occurring only 
once or twice in the corpus. The extremely fine-grained tag set with over 200 mor- 
phosyntactic labels (see the Appendix of Meelen 2016 for a full overview) can thus 
only reach a good Global Accuracy in a large corpus. This first corpus is not very 
large, which is why a Global Accuracy of over 90% is an acceptable result. 

To conclude this section, Middle Welsh presents a good test case for POS tag- 
ging a historical corpus of a language with rich verbal and prepositional inflec- 
tion and non-standardised orthography. The MBT showed robust results and 
flexibility with the highly variable orthography of minimally pre-processed 
Welsh texts. The parameter settings of the MBT software allow for focus on the 
context and the last 3 letters of unknown words. Since Middle Welsh verbal end- 
ings usually consist of 2/3-letter suffixes (reflecting tense, mood, aspect, person 
and number combined), it is not difficult for the tagger to predict the right form 
(e.g. gwel-eis ‘I saw’ as VBD-1SG denoting ‘first-person singular preterite’). Other 
parameter settings like an additional focus on the first 3 letters of the word 
proved to be less helpful for a language like Welsh with initial consonant muta- 
tion. This might, however, improve the results for languages with a strong prefix- 
ing preference, like for example Navajo (Young and Morgan 1980: 103, 107). 
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4 Annotating syntax and information structure 


In order to facilitate syntactic queries, the above-described morphosyntactic an- 
notation was employed to develop hierarchical phrase structure as well. A full 
parse would require a detailed Context-Free Grammar or Dependency Grammar. 
Developing this was beyond the scope of the present study, however. Instead, I 
modified the rule-based chunk-parser available in the Natural Language Toolkit 
(NLTK) in such a way that not only phrasal chunks but also some theory-neutral 
hierarchical structure could be added. 


4.1 Designing the rule-based grammar 


The NLTK rule-based chunk-parser is a regular expression parser: it systemati- 
cally combines POS tags as defined in a grammar that allows regular expressions 
to create more (specific) options. Frequently used regular expressions include: 


? = for optional preceding items | = ‘or’ 


The combination of words with their POS tags into phrases, e.g. Noun Phrases 
(NPs), Determiner Phrases (DPs) or Prepositional Phrases (PPs), is achieved 
with the following sample pattern of commands: 

— NP: {<N|NPL|NPR>} 

- DP: {<D><ADJ|ADJPL>?<NP>} 

- PP: {<P><NP|DP>} 


According to the above set of rules, a noun phrase (NP) can be formed of words 
with one of three different POS tags: a noun (N) or a plural noun (NPL) or a 
proper noun (NPR). Similarly, a DP, in this grammar, is formed by a determiner 
(D) followed by a noun phrase (NP) with an optional adjective (singular ADJ or 
plural ADJPL) in between. The order in which this rule-based grammar operates 
is important. The DP-rule above must follow the NP-rule to find the label <NP>. 
In this way single-layered hierarchical structures (NPs within DPs) were cre- 
ated. Similarly, a further layer could be created resulting in a PP containing a 
DP containing an NP, as long as they are called in the right order. 

This is all straightforward in a language with extremely simple noun 
phrases and/or with a very limited amount of POS tags. Middle Welsh noun 
phrases, however, present some problems in this respect. First of all, some ad- 
jectives either follow or precede the noun they modify, with different meanings 
in either of the two positions. In addition to this, possessive pronouns and 
quantifiers can be part of the noun phrase as well. Furthermore, demonstratives 
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must follow the noun (and its modifying adjectives) and they are also obligato- 
rily accompanied by the definite article preceding the noun phrase, as in (2). 
Finally, phrases with numerals in Welsh come in many shapes and forms, as 
(3) shows. Welsh numerals above ten can be split to occur before and after the 
noun phrase. In addition to that, phrases with numerals can also employ the 
preposition o ‘of’. 


(2) ay cathod mawr 
DEF cats big 
‘the big cats’ 
b. yr hen lyfr mawr hwn 
DEF old book big PROX 


‘this big old book’ 
(3) a. tair 0 ferched 
threegg, of girls 
*three girls' 
b. tri 0 bobl eraill/newydd 
threeyasc of people otherp;/new 


‘three other/new people’ 
c. dau hen lyfr lyfr 


twOmasc old book book 
‘two old books’ 

d. un mlynedd ar ddeg 
one year on ten 
‘eleven years’ 


Complex noun phrases can also consist of two juxtaposed nouns, as in (4). In 
these constructions, the definite article only appears before the second noun, 
but the whole construction is definite. 


(4) a. dyn y siop 

man DEF shop 
*the man of the shop' 

b. pob yn ail fis 
every PRED second month 
‘every other month’ 

C. yr holl broblemau 
DEF all problems 
‘all the problems’ 
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The above types of complex noun phrases require a very detailed rule-based 
grammar that includes all possible phrases, including some phrases with 
special labels to facilitate further syntactic queries, e.g. phrases with verbal 
nouns (which can function as infinitives or nouns). Such a rule-based gram- 
mar explicitly looks for sequences of, for example, numerals + adjectives + 
nouns such as dau hen lyfr ‘two old books’ in (3c). Similarly, when put in the 
right order, complex noun phrases with quantifiers can be combined with ex- 
plicit searches for the sequence determiner + quantifier + noun (D>Q>N). The 
full rule-based grammar I designed can be found in Meelen (2016: Appendix). 
This is a flexible template that can easily be extended and adapted to achieve 
better results when more texts are added. 


4.2 Manual correction 


No automatic NLP task is 10096 correct. The rule-based chunk-parsers performs 
very well with simple matrix clauses, but subordinate clauses and some com- 
plex DPs in particular need some correction. I manually corrected the entire 
corpus using CesaX. CesaX is a special software package developed by Erwin 
Komen to facilitate corpus-linguistic research (cf. Komen 2011). Another useful 
feature of CesaX is that is can automatically convert the chunk-parsed .psd-files 
to XML-files with a simplified TEI P5 header. These files can then be queried 
using CorpusSearch” or the XML-based XQuery language. Manual correction in 
CesaX is quick and easy, because of its graphic representation of the tree struc- 
tures. Alternatively, the bracket representation shown below, can also be edited 
manually with any text editor if needed: 


(S (DP (NP (N taryan))(ADJP (ADJ eur))(NP (N grwydyr))) 
(VP (PCL a)(VBD-3PL dodassant)) 
(PP (P dan)(DP (PRO-G y)(NP (N penn)))) (. .)) 


Figure 2: Sample bracket representation. 


11 CorpusSearch is a query language that finds syntactic structures in a corpus of annotated 
sentence trees. It can be used as a development tool for building the corpus or as a research 
tool to find and collate results in a corpus (see Section 5.1). 
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The above output from the automatic chunk parser reflects the following example: 


(5)  Taryan eur grwydyr a dodassant dan y penn 
shield gold enamelled PTC put44,,; under 3SGmasc.ross head 
They placed a gold enamelled shield under his head.' (Williams 1908: 
lines 18-19 [Breuddwyd Maxen]) 


4.3 Annotating information structure 


Information-structural features were added semi-automatically. With CorpusStudio 
software (see Komen 2009 and Section 5), various features can be automatically 
added. Information for these features can be derived from the detailed POS 
tags of the specific words, from the phrasal structure and/or from the context 
in which it occurs. For example, since personal pronominal subjects usually 
convey ‘old’ information, with some simple XQuery commands the referential 
status of these subject pronouns can be automatically labelled ‘Old’. Other 
specific features of the clause such as the tense, aspect or mood of the verb or 
the person-number inflection can be derived from the detailed set of POS tags 
in the same way. 

Further information-structural notions such as topic or focus are not as 
easy to detect automatically. However, if special focus words, pronouns or par- 
ticles are used, these were labelled as such by the detailed POS tagger and 
therefore the focus domain or articulation of the sentence can automatically be 
annotated accordingly. In addition to this, constituent focus in Middle Welsh 
could be indicated by a (reduced) cleft and a verb with default third-person sin- 
gular inflection. Pronominal subjects in the first or second person or plural full 
DPs can be automatically detected as well. When it comes to labelling the exact 
type of topic (e.g. familiar, aboutness or contrastive) or focus, manual annota- 
tion is still required. 

All additional features (including the information-structural ones discussed 
here) are added at the matrix clause level. In practice, this means a list of fea- 
tures with automatically derived values (by querying the POS tags) and open 
values (to be adjusted manually) is available for every matrix clause. These fea- 
tures include:” 


12 These features are chosen because combined they cover all relevant information-structural 
notions. See Meelen (2016: Chapter 2) for further motivation with detailed examples from Middle 
Welsh and other languages. 
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— Focus articulation, e.g. constituent focus 

— Focus particle/word, e.g. hefyd ‘also’ 

— Point of departure, e.g. temporal clause ‘at that moment...’ 
- Information flow, e.g. unmarked 

— Referential state subject, e.g. old information 

— Referential state object, e.g. new information 

- Diathesis, e.g. impersonal verb 

— Tense/aspect, e.g. imperfect 

- Mood, e.g. subjunctive 

— Semantic roles (in order), e.g. agent-patient 

— Animacy and definiteness subject, e.g. definite-animate 

— Animacy and definiteness object, e.g. indefinite-inanimate 


5 A brief note on possibilities to query the data 


There are various online tools available for corpus research, e.g. the search in- 
terface for the British National Corpus.” Search interfaces provide easy access 
to the data, because no prior knowledge of specific search algorithms is neces- 
sary to get any results. The types of searches are often limited to the level of 
individual words or simple part-of-speech labels, however, which is not suffi- 
cient for syntactic or more detailed linguistic analyses. If we want to gain a 
deeper insight in our linguistic data, we need a more thorough way of search- 
ing for the right information. 


5.1 Extended querying with CorpusStudio and CesaX 


For many historical syntacticians, CorpusSearch”* is a useful application that 
can retrieve the detailed linguistic data relevant to them. It enables queries in 
the treebank or labelled bracketing format (the .psd format described above). 
With CorpusSearch, these first POS-tagged and chunk-parsed files can thus be 
easily queried and, for example, compared to data from historical corpora of 
other languages. It should be noted, however, that this first partial corpus only 
contains an extended shallow parse without syntactic empty categories yet. 


13 .natcorp.ox.ac.uk, 


14 |http://corpussearch.sourcetorge.net 
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These will, however, be added to PARSHCWL in the future (see Meelen and 
Willis forthc.). 

Another way to retrieve detailed syntactic information is by converting the 
(parsed) files to XML format and query them with the usual search function for 
XML-databases: XQuery and XPath. Erwin Komen developed a wrapper around 
CorpusSearch2 (Randall, Taylor and Kroch 2005) and XQuery to facilitate these 
searches: CorpusStudio (Komen 2009). CorpusStudio not only simplifies the 
task of formulating search queries, it also provides easy ways to organise them 
along with the corpus data and research logs documenting your goals, subqu- 
eries, definition files and any emendations while gathering the right data. 


5.2 Textual markup 


For the metadata markup, I chose a simplified version of the TEI P5 (TEI 
Consortium 2009) header that is suitable for philological data, translations and 
linguistic annotation in XML format. This simplified TEI P5 header was selected 
because the full header with all its details was unnecessary and inefficient to 
work with. In addition, parsed files can be converted to this simplified TEI P5 
header automatically by the CesaX software (see Section 4 and Komen 2011). 
Any information about the philological background of the text can be stored in 
this header and easily retrieved for future online usage. In the textual markup, 
any changes to the annotation can be indicated as well to trace the history of 
the annotated text and corpus as a whole. Finally, it would ultimately be possi- 
ble to combine different versions of the texts (i.e. diplomatic and critical edi- 
tions) into one XML file to make sure invaluable philological information is not 
lost. Its systematic but flexible nature would allow future conversion to JSON 
and RDF format as well. 


6 Conclusion 


This chapter presents the first steps towards the creation of a fully annotated 
corpus of historical Welsh. The above description of the proposed procedure is 
meant as a blueprint for the development of a fully parsed historical Welsh tree- 
bank (PARSHCWL) in the future (see Meelen and Willis forthc.). I described 
how a combination of minimal pre-processing, a systematic extension of cur- 
rent tag sets for historical corpora and a hierarchical way of chunk-parsing can 
yield important information needed to address questions about the syntax and 
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information structure of the Welsh language that have hitherto been unan- 
swered. At the current stage, the annotation of the corpus was done in such a 
way as to optimise the search queries specific to the change of word order in 
the early Middle Welsh period (see Meelen 2016). However, the same annotated 
partial corpus of Middle Welsh was also already used for studies on adjectival 
agreement in native and translated Welsh prose (see Meelen and Nurmio 2020; 
Parina and Poppe forthc.). 

The flexible XML-based nature (compatible with the .psd file structure) of the 
corpus means that any further philological or linguistic annotation can be added 
at a later stage as well. At various stages in the process of creating the corpus, 
manual correction was necessary. For this pilot, there was only one annotator 
available to do the manual correction of the limited pre-processing and more- 
detailed POS tagging and parsing tasks. Therefore, checking cross-annotator 
agreement, which is needed to verify the results, was not an option. In future, 
when making the annotated files accessible for everyone online, a final check 
will be done to filter out any possible mistakes and/or inconsistencies. 

This chapter presents a good test case for annotating a partial historical 
corpus of a language with rich verbal and prepositional inflection. The main 
challenges in building annotated corpora like these lie in the availability of 
good digitised diplomatic or critical text editions. Further collaboration with 
scholars specialised in the philological background producing these editions 
can help linguists to make the right decisions, both in terms of selecting the 
right texts and editions for the corpus, but also in pre-processing and tokenisa- 
tion in particular. More elaborate pre-processing of the texts, including the de- 
velopment of a good stemmer to do normalisation etc. and expanding the 
training set takes time, but will yield better results for the automatic POS tag- 
ging and parsing tasks in the end. A relatively large set of over >200 morpho- 
syntactic tags was developed and presented here, because those details give 
important new research opportunities for both Welsh philologists and linguists. 
A standardised way of expanding the tag set for rich inflectional languages is 
called for and the proposed extensions outlined in Section 3 above aims to be a 
good starting point for future extension and refinements when more texts are 
added to the corpus (see Meelen and Willis forthc. for this and subsequent 
steps towards building a fully parsed Welsh treebank). 


Theodorus Fransen 

3 Automatic morphological analysis 
and interlinking of historical Irish 
cognate verb forms 


1 Introduction 


The main aim of the author’s research project is to use computational approaches 
to gain more insight into the historical development of Irish verbs. One of the ob- 
jectives is to investigate how a link between the electronic Dictionary of the Irish 
language (eDIL),' covering the period c. 700-c. 1700, but focussing on Early Irish 
(7th-12th centuries), and the nascent Foclóir Stairiúil na Gaeilge ‘The Historical 
Dictionary of Irish”, covering the period 1600-2000, could be implemented. Such 
a link will be hugely beneficial for scholars operating at the intersection of the me- 
dieval and modern period (see Table 1), who currently lack a comprehensive lexi- 
cal resource for the *intermediate" early modern period. 

The above-mentioned lexicographical discontinuity is problematic, and needs 
to be remedied, especially in the light of the pervasive changes in the verbal sys- 
tem between Early and Modern Irish. The author's motivation for focussing on the 
verbal system in Early Irish resonates with the following observation by McCone 
in his authoritative monograph on the Early Irish verb: 


Concentration upon the verb was dictated by its generally conceded status as the most 
difficult and interesting area of Old and Middle Irish morphology and few would deny 
that an understanding of the Old Irish system's workings and development into and 
through Middle Irish is a prerequisite for being able to deal with the abundance of Old 
and Middle Irish texts effectively. (McCone 1997: xviii) 


During the author's research it was found that eDIL does not provide full verb para- 
digms for many verb entries. It was felt that additional language technology is nec- 
essary to deal with the complex Early Irish verbal system. Such technology will also 
facilitate more systematic and comprehensive interlinking of verb forms in lexico- 
graphical resources. The main contribution to this end by the author is the develop- 
ment of a morphological analyser for Old Irish, which is also the focus of this paper. 


1 Available at: [accessed 7 February 2019]. 


2 Available at: Rttps:/7www.ria-ie/research-projects/Tocloir-stairiuil-na-gaeilge [accessed 7 February 
2019]. 


3 Open Access. © 2020 Theodorus Fransen, published by De Gruyter. This work is licensed 
under the Creativ 
D a g 
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In order to make this contribution accessible to (computational) linguists 
whose research area is not Old Irish, a brief overview of the Irish language periods 
(section 2) and the basics of the Old Irish verbal system (section 3) is provided. The 
latter aims to show how phonology imposes itself on verb morphology, resulting 
in an often complex relationship between an underlying verb root and a verb’s 
multiple surface shapes - an insight crucial for the computational implementation. 
Section 4 sums up important changes in the verbal system in Middle Irish and be- 
yond. In the second half of the paper, the focus is on digital resources for historical 
Irish and Natural Language Processing methods. Section 5 surveys important exist- 
ing digital resources and computational methods used to deal with historical texts. 
The proposed methodological framework of the paper is the topic of section 6. 
Section 7 introduces finite-state morphology and presents some highlights, as well 
as challenges, in the development of a morphological analyser for Old Irish verbs. 
The formulation of clear-cut verb stem entities constitutes a key feature in the 
implementation. Suggestions for automatically linking cognate verb forms are 
presented in section 8. A synthesis of matters discussed in this paper follows 
in section 9, which also outlines some research prospects. 


2 A historical sketch of the Irish language 


The historical period of Irish can be divided into the language stages shown in 
Table 1 below. Greene (1966) provides a succinct overview of the history of the 
Irish language. Early Irish represents the language from the early medieval pe- 
riod up until about 1200. After that we speak of Modern Irish. Old Irish, like the 
modern standardised language, can be treated as a normative phase in the 


Table 1: Medieval and Modern Irish language periods. 


Language stage Time period 
Early Old Irish 7th-9th centuries A.D. 
Irish Middle Irish 10th-12th centuries 
Early Modern Irish 13th —mid-17th centuries 
(including Classical Modern Irish) 
t Post-Classical Modern Irish mid-17th-mid-19th centuries 
ris 


Irish of the Revival period late 19th-early 20th centuries 


contemporary standardised Modern Irish 1958-present 
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history of the language. Indeed, “classical” Old Irish, the language as wit- 
nessed predominantly in Old Irish glosses in Latin manuscripts, is the basis for 
many grammars and handbooks, including A Grammar of Old Irish by Rudolf 
Thurneysen (1946) (GOI). While representing a stable and normative phase in 
the language’s history (McCone 1997: 166), Old Irish shows diachronic as well 
as synchronic linguistic variation (for a discussion of the latter see McCone 
1985). However, the linguistic variation in Old Irish is negligible compared to 
the unstable and highly variable language seen in Middle Irish texts. As 
McCone (1997: 166-167) has pointed out, Middle Irish comprises standard Old 
Irish forms and forms anticipating Modern Irish usage, as well as forms that are 
consonant with neither. The end of the Middle Irish period sees the production 
of the great medieval Irish manuscripts.” 

The subsequent Early Modern Irish period (13th—mid-17th centuries) is domi- 
nated by a literary genre of praise poetry in syllabic verse composed by court 
poets, referred to as Classical Modern Irish (McManus 1994). In contrast to the 
highly regulated grammar of this bardic poetry, however, hugely varying registers 
can be observed with prose texts of this period, ranging from archaic language to 
registers that are not far removed from 19th-century Irish (Ó hUiginn 2013: 87-89). 

Post-Classical Modern Irish refers to the literary period between the downfall 
of the Irish-speaking aristocracy in the early 17th century and the Great Famine 
(1845-1849), which is characterised by — amongst other developments — a more 
regional orientation in writing (Ó Háinle 2006). The classical literary standard that 
had emerged in the early modern period gradually gives way to writing conven- 
tions that more closely reflect the contemporary spoken language, resulting in the 
coming to the fore of the Irish dialects in texts of this period (Williams 1994). 

The period between the Great Famine and the creation of the Free State 
(1922) is known as the Gaelic Revival, which witnessed an increased production 
of original work, facilitated by institutions such as the Conradh na Gaeilge 
[Gaelic League], established in 1893 (Mahon 2006). After independence, plans 
were made for a standardisation of Irish grammar and spelling, ultimately codi- 
fied in a 1958 booklet published by the Irish government's Translation 
Department (with further revisions in 2012 and 2016).° 


3 Ó Cróinín (2001) discusses diachronic orthographical developments in the earliest Old Irish 
glosses. Two conventions, the "Irish" and "British" system, seem to have competed with each 
other; the latter ultimately became the standard for all subsequent Irish literature. 

4 Lebor na hUidre, Rawlinson B 502 and The Book of Leinster (An Leabhar Laighneach) 
(Breatnach 1994a: 222—225). 

5 Available at: https: 


hn-caighdean-oifigiuil-201/ en.pdl] [accessed 7 February 2019]. 
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3 The Old Irish verb: The morphology-phonology 
interface 


3.1 The main skeleton of the verbal complex 


In general, Old Irish is a VSO language (Russell 2005: 430). However, additional 
variant structures are found (Mac Coisdealbha 1998), especially regarding the 
subject position (Lash 2014b). Both the verb, subject and object may be contained 
within the “verbal complex” (see McCone 1997: 1-19), comprising everything that 
falls within the accentual domain of the verb (Stifter 2009: 84), as such poten- 
tially constituting a highly synthetic “word”. Leaving aside copula constructions, 
Old Irish inflected verb forms incorporate the subject; no independent subject 
pronouns exist. Third person forms - from the viewpoint of word-based parsing - 
are inherently ambiguous in that there might or might not be an independent 
subject. In the present work, as is customary, third person verb forms are not 
glossed with a pronominal subject in the English translation. 

There is a distinction between “simple” and “compound” verbs. Verbs with 
the verb root as their sole lexical element, as in (1) and (2), containing root ber, are 
called simple. A compound verb additionally takes one or more preceding lexical 
preverbs (PV), originating in prepositions,? modifying the meaning of the verb 
root. In (3) and (4), the preverb is underlying/historical to combined with the root 
ber. As a rule, the first preverbal element within the accentual domain of the verb 
is realised as a proclitic, resulting in a juncture between, put simply, a prefix and 
the stressed part of the verbal complex, as in (2)-(4). This juncture is denoted by a 
mid-high dot to facilitate grammatical analysis; it is not present in manuscripts. 


(1) beirid 
6౭౦36౨౫౩ 
‘carries’ 

(2  ni-beir 


NEG-carry3sc.pres 
‘does not carry’ 


(3) do-beir (to-ber-) 
PV-bring3sc. pres 
“brings” 


6 Called thus by Thurneysen (GOI 88 819—821). 
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(4) ní tabair (to-ber-) 
వక pres 
*does not bring' 


Some commonly used grammatical notions relating to stem and ending forma- 
tion are key to understanding the workings of the verbal complex. First, there 
are two ending sets, *absolute" and *conjunct". Only simple verbs can take ab- 
solute endings, and only when occurring in clause-initial position. The con- 
junct ending set applies when a verb is conjoined with a preverbal element; 
compound verbs therefore invariably carry conjunct endings, while simple 
verbs take this set of endings when preceded by the preverbal *conjunct par- 
ticles” (C), e.g. the negative particle ni ‘not’, as in (2). In (1), -id is the third sin- 
gular present indicative absolute ending. The corresponding conjunct ending is 
seen in (2)-(4), where palatalisation of the root-final r (orthographically en- 
coded by preceding i) is the only marker of inflection. 

A verb preceded by a conjunct particle is said to be dependent, and inde- 
pendent otherwise. The distinction between independent and dependent has 
major repercussions for the surface shape of especially compound verbs. 
Generally speaking, an independent compound verb appears in its “deutero- 
tonic" form as the first preverb is realised as a proclitic, causing the stress to 
fall on the second element (the verb root in [3]). When the proclitic “slot” in the 
verbal complex is occupied by a conjunct particle, as in the (dependent) com- 
pound form in (4), the stress is on the verb's first preverb; this stem alternant is 
accordingly called “prototonic”.’ As (3) and (4) illustrate, the stress system of 
Old Irish may result in “complex synchronic morphophonemic alternations” 
(Stifter 2009: 90) and, consequently, a system of “double stem formation” 
(Russell 2005: 431). The abundant allomorphic variation seen in the Old Irish 
verbal system raises a question crucial for implementational purposes: what ex- 
actly is a verb stem in Old Irish? Section 7.2.2 will detail how this question has 
been tackled from a computational point of view. 


7 There are some exceptions to the rules laid out here in relation to dependency, the distinc- 
tion deuterotonic/prototonic and the set of endings that is demanded. In the imperative only 
conjunct endings exist, and compounds in the imperative appear in their prototonic form re- 
gardless of dependency. A further anomaly exists with compounds whose first preverb is ei- 
ther to, fo or ro, which may equally assume their prototonic form in independent position if 
the following element starts with a vowel, causing vowel elision (McCone 1997: 3). Proclitic 
conjunctions such as co ‘that’ (GOI § 896) may be found with either independent or dependent 
verb forms, i.e. they sometimes assume the status of conjunct particle. 
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Compound verbs may take up to four preverbs, each of which adhere to a po- 
sitional hierarchy tentatively formulated by McCone (1997: 89-90). Verb roots can- 
not be arbitrarily compounded with any preverb. However, most verbs are liable to 
being (further) compounded with an “augment”. While a lexical preverb in origin, 
the augment has developed a “modificatory function that belongs to the grammar 
of Old Irish and not to its lexicon” (McCone 1997: 91). This preverbal particle sup- 
plies either a resultative or potential meaning, depending on the tense and/or 
mood of the verb form that it occurs with, illustrated with (5) and (6), respectively. 


(5) ro-léic 
AUG -letsso puer 
*has let 


(6) as-robair (ess-ro-ber-) 


PV-Say uc.3sc.pnrs 
‘can say’ 


The augment is most commonly ro (position 4), while the augments ad (posi- 
tion 3) and cum (position 4) are more restricted — i.e. the latter two co-occur 
with a limited set of (lexical) preverbs.? For a discussion on the preverbal parti- 
cle ro and other augmentation strategies see GOI (88 526-537). Simple verbs 
(which do not have a preverb) almost always take ro, rather than ad or cum. 
The augment adds to the already abundant allomorphic variation seen in stem 
formation and its position is subject to change during the Early Irish period, in 
parallel with other processes of reorganisation and simplification of the verbal 
system,’ most importantly the univerbation'? of old compound verbs. 

The morphosyntax or morphotactics of the verbal complex, i.e. the legal 
combination of morphemes (Beesley and Karttunen 2003: 26-27), with optional 
morphemes in brackets, is schematically summarised in (7) (C = conjunct par- 
ticle, * = zero or more, with the provision that the total of preverbs does not 
exceed four, E = ending). Table 2 shows the schematic structure of the verbal com- 
plex with examples of preterite formations (with the conjunct third person singular 


8 ad and cum are underlying forms, subject to a substantial amount of allomorphic variation 
depending on whether they are stressed or not. 

9 Already in the Old Irish period, and during the Middle Irish period, ro is gradually adopting 
the status of conjunct particle, mitigating its effects in stressed position. The positional behav- 
iour of ro and its semantics is outside the scope of the present paper; the reader should refer to 
McCone (1997: 127-161) for a detailed description of this preverbal particle. 

10 A lexicalisation process involving the “unification . . . of a syntactic phrase or construction 
into a single word” (Brinton and Traugott 2005: 48). 
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Table 2: Schematic structure, including the position of the augment ro, of the Old Irish verbal 
complex, adapted from McCone (1997), with third person singular examples of unaugmented 
and augmented preterite forms with root /éc, illustrating combinatorial possibilities and 
allomorphic variation in stem formation (C = conjunct particle, * = zero or more, with the 
provision that the total of preverbs is not more than four). For Old Irish phonemes and their 
graphemic representation see Stifter (2006: 377-379). 


Lemma 


Structure 
(RO = augment ro) 


VROOTE 


Dependency Ending 


indep. abs. 


Example, pret. 3sg. 
(bold = lexical 
element, italics = 
stressed syllable) 
léicis 

/'Ve:gias!/ 


let3sc.pret 


léicid 
(simplex) 


C- VROOTE 


depend. 


RO - VROOTE 


indep. 


C- Ro VROOTE 


depend. 


PV1 - PV* VROOTE 


indep. (deut.) 


conj. 


vi. 


do-léici 


C - PV1 PV* VROOTE 


depend. 
(protot.) 


(compound) vii. 


PV1 - PV* Ro PV* 
VROOTE 


indep. (deut.) 


viii. 


C - PV1 PV* Ro PV* 
VROOTE 


depend. 
(protot.) 


ni-léic 
[nii 'Úe:gj/ 
NEG-let3s¢.pret 


ro-léic 

[ro Ve:g!/ 
AUG-letssc.pnET 
ni-reilic 

[nii: 'rellagi/ 
(ro-l&c-) 


NEG-letAuG.5sc.PRET 


do-léic 

/do 've:g!/ 
(to-léc-) 
PV-cast3se.pret 


ni-teilic 

[nii: 'teliag!/ 
(to-léc-) 
NEG-caSt3s56.pret 


do-reilic 

[do 'r'ellagi/ 
(to-ro-l&c-) 
PV-castauc.ssc.PRET 


ni-tarlaic 

/nii: 'tarlag!/ 
(to-ro-léc-) 
NEG-castauc.ssc.PRET 
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form having a zero ending); it illustrates how the stress pattern (phonology) of Old 
Irish impacts on the verb morphology. The situation is slightly simplified in that ro 
represents the augment; ro is the particle’s most common allomorph and the one 
found with léicid ‘lets’ and do-léici ‘lets go, releases, casts’. 


(7) (C) PV* (AUG) PV* VROOT E 


3.2 Adjuncts and notae augentes 


The skeleton of the verbal complex, outlined in (7), allows for incorporation of 
unstressed, clitic “adjuncts” (McCone 1997: 9) and notae augentes, illustrated 
in this subsection with various examples." Independent simple verbs allow a 
pronominal object, e.g. -us in (8), illustrated with the verb benaid.? 


(8) bentus 
strikessc.pres-3SGrem 
‘strikes her’ 


The alternative strategy is to employ infixation, as in (9), with the pronoun -m at- 
taching itself to the available proclitic (here ni). Infixed pronouns directly precede 
the proclitic boundary and come in three classes (GOI 88 409-427); the choice be- 
tween class A and B is phonologically conditioned, whereas the choice for class C 
is conditioned based on syntactic grounds. Simple verbs without a preceding pro- 
clitic element acquire the “meaningless” preverbal particle no for purposes includ- 
ing infixation of pronouns, illustrated in (10). Infixed pronouns are often 
accompanied by following initial mutations (which are often not orthographically 
marked due to the underspecified nature of the Old Irish spelling system). 


11 Unless referenced explicitly, examples are either hypothetical or sourced from eDIL. 

12 The derivation is ben(a)ith + us with subsequent syncope (for which see 3.4) and delenition 
of th after n. 

13 For the initial mutations see GOI (88 229-244). Lenition is the pronunciation of consonants 
with less acoustic energy. As Thurneysen has pointed out, scribal evidence of lenition in Old 
Irish is initially confined to the letters p, t, c which turn into fricatives, marked by a following 
h (ph /f/, th /0/, ch /x/). Lenition of f and s is not indicated in the earlier glosses. Lenited f is 
silent and may be omitted altogether in the spelling; lenited s represents /h/. In the course of 
Old Irish, lenition is also marked on f and s by employing a punctum delens (f, $). Nasalisation 
refers to the prefixing of n to an initial vowel and the homorganic nasal to b and g (mb /mb/, 
ng /g/), voicing of p, t, c and f (hardly ever expressed in the spelling) and gemination with s, 
r, l, m, n when preceded by a proclitic vowel (not always marked in the spelling). 
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(9) nim-beir 
NEG-1sG-""“carty3sc.pres 
‘does not carry me’ 


(10) nom-beir 
PV-15G-"""carrysss pass 
*carries me 


Special relative endings exist for independent simple verbs for the absolute 
third person singular and first and third person plural, e.g. (11). In other cases, 
relativity is marked by an initial mutation following the proclitic preverb, as in 
(12), or suffixing -e/-a, followed by lenition, in case of the preverbs im(m) (ex- 
emplified in [13]) and ar.“ 


(11) léices 
let3s¢.pres.REL 


‘who lets, which (s)he lets’ 


(12) do-léici 
PV-"""castssc pass 
*who casts, which (s)he casts' 


(13) imme-thét (imbi-teg-) 
PV-REL- ^"go.aboutssc pres 
*who goes about, which (s)he goes about 


The enclitic notae augentes occur in final position in the verbal complex and 
reinforce an already present subject or object, as in (14).? 


(14) at-beir-som (ess-ber-) 
PV-3SGyrur'SAY3sc.pres =3SGmasc/neUT 
‘he says it’ 


We arrive at the schematic overview in (15), loosely based on McCone (1997: 17) 
(C = conjunct particle, * = zero or more, with the provision that the total of 


14 Occasionally a/e appears with other preverbs (GOI § 493.4): reme- (for remi-), íarma (for 
iarmi-, íarmu-) and assa: (instead of as-). 
15 Examples of this form in the glosses are cited in Griffith (2008: 59). 
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preverbs does not exceed four, E = ending, A = adjunct, N = nota augens; A and 
N cannot occur together in an independent simple verb). 


(15) indep. simple: VROOT Eass (A) (N) 
indep. simple augm.: AUG (A) - VROOT Econ (N) 
depend. simple: C (A) * (AUG) VROOT Econs (N) 
indep. compound: PV1 (A) - PV* (AUG) PV* VROOT Econy (N) 


depend. compound: C (A) - PV1 PV* (AUG) PV* VROOT Econy (N) 


Taking together all inflectional forms across the tense/mood paradigms, we ar- 
rive at about one hundred and twenty inflected forms per verb. If we include 
affixed adjuncts, augments and notae augentes, we are talking about several 
orders of magnitude more. This *combinatorial" problem is compounded by 
the fact that scribal practice was often to present the composite elements in the 
verbal complex as a concatenative string. This results in segmentation chal- 
lenges, which will be addressed in 7.2.3. Essentially, a computational frame- 
work should be able to identify the verb root and all its surrounding elements 
in strings without mid-high dots, spaces and hyphens, as in nondobmolorsa in 
(16), found in the Würzburg Glosses (Wb.) (Thes. 1: 593). This example contains 
the first singular present indicative of the deponent verb (see 3.3) molaithir 
‘praises’ with a nota augens, preceded by the meaningless preverbal particle 
no. In the indicative and subjunctive tenses, no is used to infix relative n,’° sig- 
nalling a nasalising relative clause (GOI 88 497-504). This is what we have in 
(16), enforced by the conjunction hore (óre, [h]úare) ‘because’ and realised by 
nasalisation of initial d of the infixed pronoun." 


(16) hore no -n -dob mol -or -sa 
PV -"5 .2PL . praise -isc.pres =1SG 
“because I praise you’ (Wb. 14°18) 


16 PAEA 


17 For the initial mutation called nasalisation see fn. 13. 
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3.3 A brief description of stem and ending formation 


Apart from the rather small class of hiatus verbs, with roots ending in a vowel, 
Old Irish exhibits an opposition of weak (W1/W2) and strong verbs (51-53), which 
are classified according to present stem formation.'® Verbs have five stems: pres- 
ent, subjunctive, future, preterite and preterite passive. Stem formation with weak 
verbs is through largely regular and hence predictable suffixation. Strong verbs 
show a combination of suffixation, vowel alternations (ablaut) and reduplication, 
which are largely unpredictable unless one knows the underlying abstract root 
shape (Stifter 2009: 96). For instance, crenaid ‘buys’ has a future 1sg. conj. -ciur, 
which can be explained by reduplication of the abstract root cri > ci-cr ... and sub- 
sequent lengthening of i to compensate for the disappearance of lenited (fricati- 
vised) postvocalic c before r (GOI 8871, 691). While Old Irish verb morphology 
abounds in complex allomorphic stem alternation, further complicated by analogy 
(for an example see 7.2.2), the term "irregular" is arguably best reserved for sup- 
pletion, i.e. usage of different roots across a verb's paradigm. 

There are six groups of ending sets which are not arbitrarily combinable 
with the five stems (Stifter 2009: 96). Apart from the imperative and “second- 
ary" endings (used with the imperfect, past subjunctive and conditional), all 
ending sets come in two series, i.e. absolute and conjunct, albeit only relevant 
for simple verbs (see 3.1). Both suffixation and stem-internal modifications are 
employed in ending formation. The latter comprise alternation of the root- 
vowel, the change of quality ([non-]palatalisation) of the root-final consonant 
and the insertion of u into the stem (“u-infection”, Stifter 2009: 67). 

There are separate inflectional endings known as deponent, used with a lim- 
ited set of verbs. While appearing as passives due to endings in -r, deponent verbs 
convey active meaning; the deponent property is therefore “merely lexical", and, 
consequently, “has to be known for each verb separately" (Stifter 2009: 87). 


18 The classification system is the one used in McCone (1997). GOI employs A for weak (and 
hiatus) verbs, and B for strong verbs (with further subclassifications using Roman numerals). 
McCone's classification is used here as the letters W and S are more obvious designators for 
verb type, and a third letter H is reserved for hiatus verbs. Furthermore, McCone's classifica- 
tion reflects a re-examination of inflectional patterns, more clearly showing similarities be- 
tween inflectional classes (using a subclassification systems of Arabic numerals followed by 
(optionally) the letters a, b and c). A conversion table is found in Stifter (2006: 381), who also 
adopted McCone's classification system. 
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3.4 Syncope 


Syncope is the deletion of vowels in even-numbered, non-final syllables in words 
with more than two syllables (Stifter 2006: 49). In verb forms, the syncope rule op- 
erates from the first stressed syllable onwards, that is, the one immediately follow- 
ing the proclitic juncture of the verbal complex. The addition of a nota augens (see 
3.2) does not cause syncope. The effects of syncope are most pronounced in com- 
pound verbs (GOI § 107), where alternation in stress causes much allomorphic vari- 
ation in the verb stem and the preverbs (see 3.1), e.g. (viii.) ni-tar'laic’? (to-ro-léc-) 
in Table 2, with deletion of o in ro. A syncopated front vowel (e, i) results in a pala- 
talised consonant cluster, while a syncopated back vowel results in a non-palatal 
cluster. The latter explains the surface form -tarlaic, where the consonant cluster - 
rl- becomes non-palatalised because the syncopated vowel was a back vowel, with 
verb root lec surfacing as laic /log!/. There are many attested instances of irregu- 
larly applied syncope; an in-depth discussion of some irregular patterns is pro- 
vided in O Crualaoich (1999); see also 7.2.2 in the present chapter. 


4 The verb in Middle Irish, and beyond 


The Old Irish verbal system undergoes major changes in Middle Irish, eventu- 

ally resulting in a much-simplified inflectional system in Modern Irish. The key 

Middle Irish developments are documented in detail in Breatnach (1994a: 

278-325) and McCone (1997: 163-241). The changes between Early and Modern 

Irish are summarised in a-c below. 

a. Development of an immutable root shape and transparent stem formation, 
i.e. univerbation of compound verbs and, as mentioned in 3.1., the gradual 
development of ro as a conjunct particle (Breatnach 1994a; McCone 1997). 

b. Replacement of affixed pronominal objects by independent object pro- 
nouns (Breatnach 1994a; McCone 1997); 

c. Homogenisation (and later renewal) of personal endings, the gradual emer- 
gence of independent subject pronouns (outside copula constructions) and, 
in conjunction with this, analytic verb forms (Breatnach 1994a; Greene 
1958; Greene 1973; McCone 1997; McManus 1994). 


Developments a. and b. reach completion in Middle Irish, while the developments 
in c., apart from the streamlining of present and preterite endings, are present in 


19 The dagger denotes a syncopated vowel. 
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embryonic form (subject pronouns) or take place, for the most part, in Early 
Modern Irish (the development of analytic verb forms). A comprehensive discus- 
sion of these pervasive changes is outside the scope of this paper, but some impor- 
tant references have been provided. 

The opposition of deuterotonic and prototonic and associated morphopho- 
nemic variation was largely done away with by creating new (generally weak) 
simple verbs based on mainly old prototonic compound bases (McCone 1997: 
192-193). This can be illustrated with do-léici, prototonic -teilci, developing 
into the simple verb teilcid on the basis of analogy with the simplex: léicid: 
léici, x: -teilci, x = teilcid. A more extreme example of stem simplification is the 
Old Irish compound verb do:sluindi, -diltai (di-slond-) ‘denies’, developing into 
the Middle Irish simple verb díltaid (McCone 1997: 207-209), which is the basis 
for the modern stem diültaigh. These examples illustrate how a verb stem or 
lemma can change beyond recognition between Old and Modern Irish. 


5 Survey of digital resources and computational 
methods 


5.1 Overview 


This section gives a survey of resources and tools to be incorporated in - or po- 
tentially useful for - the author's research. The main goal is to illustrate the 
under-resourced status of historical Irish. The introduction to the present vol- 
ume already documents the available lexical resources and corpora for Early 
Irish. This section, therefore, focusses on resources for Modern Irish. The most 
important digital resources are plotted on a timeline in Figure 1. A distinction is 
made between lexicons and corpora, which are discussed separately in 5.2 and 
5.3, respectively. The picture that emerges is one of fragmentation and, espe- 
cially in the case of lexicons, discontinuity. We are faced with a “lexicographi- 
cal gap” in the middle, roughly corresponding to the Early Modern Irish period 
(13th-mid-17th centuries). Discussing modern scholarship and bardic poetry, 
Mac Cárthaigh (2018: 28) observes that *we still lack such basic infrastructure 
as a dedicated dictionary for the [Classical Modern Irish] period". Similar obser- 
vations have been made in Griffith, Stifter, and Toner (2018), who provide a 
comprehensive research survey on Early Irish lexicography. Subsection 5.4 pro- 
vides a short excursion into Natural Language Processing for historical texts, 
and efforts made so far in this area in the Irish context, paving the way for the 
author's proposed methodology in section 6. 
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Figure 1: Visual representation of available digital linguistic support for historical periods of 
lrish. Lighter shades denote lesser support. 


5.2 Lexicons 


The Dictionary of the Irish Language (DIL) is the only dictionary that bridges 
the Early and Modern Irish period and “its publication as an electronic resource 
has been a great boon” (Stifter 2009: 59). However, the resource is not an ideal 
starting point for an Old Irish morphological parser due to aspects of structure 
and contents, inherited from the original hard copy. For example, the dictio- 
nary is far from exhaustive in listing inflected forms. Other limitations, some of 
which have meanwhile been remediated by the publication of the electronic 
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version, are discussed in Nyhan (2006). It should be added that the original ob- 
jective of the eDIL project was not to revise the original hard-copy dictionary, 
but to open up the wealth of information contained in it and to make it accessi- 
ble to a variety of users (Fomin and Toner 2006). 

The most important dictionary for Post-Classical Modern Irish is Dinneen 
(1927), digitised versions of which were prepared in the context of a few inde- 
pendent projects. Publicly available resources include a PDF version of the first 
edition of the dictionary"? as well as the online Irish-English dictionary,” the 
latter allowing both English and Irish searches, including the option to be di- 
rected to the relevant scanned page of the 1927 edition. The research goals of 
another project, Digital Dinneen, bear resemblance to the goals of the present 
work. The aim of this unfinished and dormant project was to create an edition 
that could be integrated with (mainly) Early Irish resources, including an XML- 
encoded electronic Lexicon of Medieval Irish (Nyhan 2006), eDIL and CELT. 
The resulting infrastructure was envisaged to allow a user to follow a Modern 
Irish form back to its earlier forms (Nyhan 2008). No tools were implemented, 
but the Digital Dinneen project has produced a (not publicly available) XML- 
encoded version of Dinneen (1927).? 


5.3 Corpora 


The Irish Syllabic Poetry (or *Bardic Poetry") corpus (c. 1200-c. 1650) consists of 
approximately 2000 poems from the Classical Modern Irish period, including 500 
previously unpublished ones edited in McManus and Ó Raghallaigh (2010). 
Corpus preparation and annotation is a joint effort by the Irish Department in 


20 Available at: [ttps://celt.ucc.ie/DinneenTsted.htm]| [accessed 30 January 2019]. 


21 Available at [accessed 30 January 2019]. 

22 While extensively documented in a Ph.D. thesis, this resource is, unfortunately, not available. 
The following link to a sample of the Lexicon was kindly provided to me by Peter Flynn (email 
dated 4 November 2014), former manager of the Academic and Collaborative Technologies Group 
(ACTS), University College Cork IT Services: [accessed 
4 May 2019]. 

23 For online information about this project, see [https://celIt.ucc.ie/digineen.htm] [accessed 
7 February 2019]. Further information was obtained by means of email contact with Beatrix 
Farber (30/10/2014 and 03/11/2014), who had the initial idea for Digital Dinneen, and Julianne 
Nyhan (23/2/2012), who informed the present author that neither a lookup mechanism nor a 
search interface has been implemented. 
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Trinity College Dublin, the School of Celtic Studies (Dublin Institute for Advanced 
Studies) and Dr Katharine Simms of the History Department in Trinity College 
Dublin, who indexed the poems and has created a database, which is currently 
being updated.” As part of the new project BARDIC@TCD (Eoin Mac Cárthaigh 
and Elaine Ui Dhonnchadha), a POS-tagged corpus currently consisting of 500 syl- 
labic poems has been made freely available and it will be updated regularly.” 

The tagging of the above-mentioned Bardic Poetry corpus employs automatic 
standardisation techniques which had already been developed in the context 
of Corpas Stairiúil na Gaeilge ‘Historical Irish Corpus’ (envisaged to comprise 
90+ million words), constituting the basis for the Royal Irish Academy’s ongoing 
project Foclóir Stairiúil na Gaeilge ‘The Historical Dictionary of Irish’ 1600-2000.”° 
Ui Dhonnchadha et al. (2014) report on the adaptation of the “modern” tagging 
tools for the second segment of this corpus (1882-1926), containing seven million 
words, many of which in a pre-standard orthography (before 1958; see also 5.4). 


5.4 Natural Language Processing methods 
Natural Language Processing (NLP)*’ is concerned with the ability of com- 
puters to process human language (Jurafsky and Martin 2009: 35). The NLP 
pipeline involves the following activities (in this order): tokenisation,?? lem- 
matisation,” part-of-speech (POS) tagging"? and syntactic parsing.” A crucial 
activity in the case of historical texts (and non-standard language in general) 
is spelling normalisation, influencing all further language processing tasks 


24 Available at: [accessed 7 February 2019]. 
25 The project website with a link to the corpus is found at [https://wwmw.tcd.ie/slIscs//researc 


[accessed 17 July 2020]. 


26 Available at: The corpus 
is found at Bttp://corpas.ria.1e/] [accessed 7 February 2019]. 

27 Alternative names for the field are Speech and Language Processing, Computational 
Linguistics and Human Language Technology. 

28 Separating punctuation marks and other non-alphabetic characters from words (Jurafsky 
and Martin 2009: 67). 

29 Grouping inflected forms of a word under its base form, i.e. its lemma (Mitkov 2003: 744). 
30 Assigning a syntactic class marker (e.g. verb, noun) to each word in a corpus (Jurafsky and 
Martin 2009: 167). POS taggers may be rule-based or trained on annotated data (e.g. statisti- 
cal), or both. 

31 Parsing is a broadly defined concept in Speech and Language Processing that involves tak- 
ing an input form and produce a structured linguistic representation. Parsing can be done on 
the morphological, syntactic, semantic and discourse level (Jurafsky and Martin 2009: 79). 
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(Piotrowski 2012: 11). Standardisation of historical forms to arrive at modern 
forms is best described as spelling modernisation (Piotrowski 2012: 69-70). The 
term “canonical cognate” is used by Jurish (2010) to refer to the mapping of an 
extant equivalent of a historical word that preserves the latter’s morphological 
root and morphosyntactic features. However, sometimes the aim is not to mapa 
historical form to a modern form, but instead to a normalised or canonical histor- 
ical spelling. This typically involves dealing with both diachronic and synchronic 
variation. 

Using NLP to deal with language variation in historical texts is far from 
straightforward: 


[T]here is no underlying computational model that describes how synchronic and dia- 
chronic variants relate to each other and — possibly - to some shared meaning or some 
kind of prototype that represents the relatedness of the variants (Piotrowski 2012: 9) 


Piotrowski (2012) has pointed out that historical language is inconsistent and 
highly variable, which hinders POS tagging. The same author mentions various 
way of tackling this problem. Two common methods, often used in conjunction 
with each other, are: 

a. Bringing an older language variety in line with a standardised or norma- 
tive — typically modern - variety (either by using rule-based or statistical 
methods) and use a “modern” POS tagger, if it exists. 

b. Employing already existing lexical resources, and create mappings across 
resources, i.e. through lemmas, dictionary headwords, etc. 


In the Foclóir Stairiúil na Gaeilge ‘The Historical Dictionary of Irish’ project 
(1600-2000), a morphological analyser and POS tagger for the standard lan- 
guage (Ui Dhonnchadha and van Genabith 2006) are conjoined with a standard- 
iser (An Caighdednaitheoir [Scannell 2009, 201) employing rule-based and 
statistical methods and a lexical database of historical and modern word pairs, 
created by the project’s language experts (Ui Dhonnchadha et al. 2014). Initial 
evaluation of the POS tagging of the 1882-1926 segment of the corpus pointed to 
F-scores? ranging from 91-96% (Ui Dhonnchadha et al. 2014). 

Dereza (2018), who discusses lemmatisation approaches for ancient and mor- 
phologically complex languages, reports that neither rule-based approaches 


32 Code available at: https://github.com/kscanne/caighdean/ [accessed 10 March 2020]. 


33 A measure of a test's accuracy that incorporates “precision” (e.g. what percentage of the 
items subjected to standardisation were correctly standardised?) and "recall" (e.g. what per- 
centage of items which should have been standardised were actually standardised?) (Jurafsky 
and Martin 2009: 479). 
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(using stems and affixes) nor statistical machine learning methods are useful for 
Early Irish due to morphophonological complexity, non-transparent orthographi- 
cal features and scarcity of data. She has developed an Early Irish lemmatiser 
using form:lemma mappings extracted from eDIL and compared two methods: 1) 
an approximate matching approach using a lemma predictor based on the 
Damerau-Levenshtein distance, checking for all possible strings of the forms on 
edit distance 1 and 2,” and 2) a neural network approach learning character-level 
sequences.” The first implementation of the lemmatiser shows 45.2% accuracy 
(i.e. the percentage of correctly generated lemmas) with unknown words and 
71.6% with known words, while the neural network metrics are 64.9% and 
99.296, respectively; the neural network approach thus greatly outperforms the 
one based on edit distance. 


6 Proposed methodological framework 


In this section, the author will briefly point out how the interlinking of cognate 
verb forms is envisaged (see section 8), and how some of the resources described 
in section 5, together with a morphological analyser for Old Irish (section 7), will 
be employed to this end. The project's methodological framework is schemati- 
cally represented in Figure 2. 

Two morphological finite-state transducers (FSTs, see 7.1), located at the op- 
posite ends of the chronological spectrum, play a pivotal role in the envisaged 
mapping of cognate historical (verb) forms. Both Old Irish and contemporary 
standardised Modern Irish reflect stable and normative phases in the language’s 
history and are (relatively) well resourced. For the modern standard language, a 
morphological FST and a POS tagger are available (Ui Dhonnchadha and van 
Genabith 2006). As illustrated in Figure 2, standardisation methods are formu- 
lated relative to Old Irish and contemporary standardised Modern Irish. 
Advanced computational standardisation methods are already successfully being 
used for tagging Corpas Stairiúil na Gaeilge 1600-1926 (Ui Dhonnchadha et al. 
2014) and the Bardic Poetry corpus, as discussed in 5.3 and 5.4. 


34 Minimum edit distance, an approximate matching technique widely used in Natural 
Language Processing, calculates how similar two strings are by calculating the minimum num- 
ber of editing operations (insertion, deletion, substitution, transposition) needed to transform 
one string into another. In one of the most well-known variants, the Levenshtein distance, par- 
ticular costs are assigned to each of these operations (Jurafsky and Martin 2009: 74). 

35 Available at: [accessed 13 February 2019]. 
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Modern Irish 


Bardic Poetry corpus | Corpas Stairiúil na Gaeilge 


Béiraí 
Pe ee Bi mm > if 
iu J 
FST నా నాల FST 
lexical-level mappings 
——700 A.D: 900——————————1200—————— ——— —1600——————————— ——2000 
Early Irish H Modern Irish 


Figure 2: Framework for automatic identification and linking of cognate lrish (verb) forms. 
FST - finite-state transducer, see 7.1. 


Seeing that much progress has already been made in tagging increasingly 
earlier historical Modern Irish forms, the present author is concentrating on the 
Early Irish side of the timeline. The following three tasks constitute the frame- 
work of the author's project: 

a. Building a morphological finite-state transducer (FST) for Old Irish, which 
can assist in future work on a POS tagger for this period. 

b. Creating lexical-level mappings between the Old Irish morphological analy- 
ser and the available tagging tools for Modern Irish. 

c. Employing standardisation methods and potential analyser/tagger adapta- 
tion, in conjunction with digital corpora, to cover the language periods bet- 
ween Old and Modern Irish. 


Task a. reflects the most novel approach in the author's research project. The fi- 
nite-state transducer can be augmented with manually parsed data from the da- 
tabases of the Old Irish glosses (currently being streamlined in CorPH, see the 
introduction to this volume) and partial lemmatisation tables for verbs as present 
in In Düil Bélrai (King, Lash, and Gabay 2006). It should be noted that the present 
work deals with morphological parsing rather than POS tagging. The task of au- 
tomatic morphological analysis is to present all the grammatical possibilities on 
the word level. POS tagging is a subsequent task that aims at disambiguating be- 
tween morphological parses (e.g. is Old Irish ben a verb or noun?) based on com- 
monly the phrasal context. Due to the highly synthetic nature of the Old Irish 
verb, fine-grained morphological analysis is an essential prerequisite for POS 
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tagging as well as other subsequent tasks in the NLP pipeline for Old Irish. 
Morphological parsing of the Old Irish verb is the topic of the next section. 


7 Automatic morphological analysis 
and generation of Old Irish verb forms 


7.1 Finite-state morphology 


Finite-state morphology is based on the mathematical notion of a finite-state 
automaton, a machine that recognises a particular set of symbol sequences 
(strings) as defined by a regular expression (a language for specifying text 
search strings, Jurafsky and Martin 2009: 17-18). Automata can be conceptual- 
ised as networks with transitions through a finite number of states. A finite- 
state transducer (FST) is an extension of this concept and contains two-level 
symbol correspondences for each path in the network. Figure 3 shows an FST 
with a mapping between a lexical-level and surface-level string representing 
present indicative third person singular absolute léicid. One of the advanta- 
geous features of this two-level formalism is that the relations encoded are in- 
herently bidirectional: an FST can be used in recognition mode to analyse 
(parse) orthographical words in a text, but it may also be used to generate, say, 
a specified set of inflected forms (listing, for instance, complete paradigms of 
verbs with root lec). Jurafsky and Martin (2009: 80) describe an FST as a “key 
algorithm for morphological parsing ... and crucial technology throughout 
speech and language processing". 


1 E c +VROOT____ +PRES +IND +ABS +3P +8G 
0:0;0,0:0/0;0;,0;0 
1 é i c i d £ [3 £ 
Figure 3: A finite-state transducer accepting, at final state 9, the surface string léicid (lower level) 

and lexical string lēc+VROOT+PRES+IND+ABS+3P+SG (upper level), constituting a two-level 
mapping. The epsilon (ఐ denotes a so-called “empty transition”: a mapping where there is no 
accompanying symbol on the opposite level, i.e. when the upper and lower strings are of unequal 


length. The term analysis is used for upward mapping, which translates into morphological 
parsing. Downward mapping equals generation of (most commonly) orthographical strings. 


3 Automatic morphological analysis and interlinking —— 69 


Beesley and Karttunen (2003) is an important reference work accompanied by a 
toolkit called Xerox Finite State Tool (xfst).*° This tool provides an extended set of 
regular expression operators, including the conditional rewrite rule format used in 
phonology, to intuitively model morphological and morphophonemic processes. 
The lexicon compiler (lexc) program (Beesley and Karttunen 2003: 203-278) facili- 
tates and simplifies the creation of morphological grammars and can be used in 
conjunction with xfst. The finite-state toolkit foma (Hulden 2009) — which is freely 
available” and compatible with xfst — is used by the author to develop a morpho- 
logical FST for Old Irish. The development of this tool is extensively documented 
in the author’s Ph.D. thesis (Fransen 2019), which, together with the code, is avail- 
able online.*® 


7.2 Implementation: Some highlights and challenges 
7.2.1 Lexical and surface-level description 


The two-level morphology paradigm is a fitting choice for the often daunting 
discrepancy between underlying and surface forms in Old Irish (verb) morphol- 
ogy, as detailed in section 3. The observation below is a suitable precursor to 
the computational challenges faced and choices made, as detailed in the re- 
mainder of this section: 


The bewildering complexities . . . become transparent only when viewed from a dia- 

chronic position, and in order to understand allomorphic variation correctly it is essential 

to work with underlying forms and their often quite dissimilar surface representations 
(Stifter 2009: 60) 


The two-level formalism does not prescribe which linguistic entities are to be as- 
signed to the upper level, although the latter is commonly reserved for synchroni- 
cally motivated underlying morphemes. The (final) surface-level forms, however, 
should obviously match against the (commonly) orthographical forms as found in 
a text corpus. 

Typically, the lexical level starts off with a lemma, and the surface level 
with a stem. In many languages, the latter bears an obvious relation to the for- 
mer. This relation, however, is far from trivial in Old Irish, as pointed out by 


36 The accompanying website is [accessed 7 February 2019]. 
37 Available at: [accessed 25 January 2019]. 
38 Available at: [accessed 10 March 2020]. 
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Stifter (2009: 60). The full range of surface (inflected) forms often cannot be de- 
duced from a single inflected form across a verb’s paradigm, which means that 
the conventional citation form provided in lexicographical resources and gram- 
matical descriptions of the language, the independent present indicative third 
person singular (e.g. eDIL),?? is of little use when it comes to formulating a sur- 
face-level stem entry. This hurdle will be tackled in the next subsection. 

The lexical level in the author’s FST consists of diachronically motivated 
underlying morphemes rather than a citation form. In other words, a verb 
form’s upper-level parse includes the abstract root shape of the verb and (with 
compound verbs) the underlying form of the preverb(s). In addition to linguistic 
motivations echoing Stifter (2009: 60), there are two practical reasons for the 
author’s modus operandi. First, the use of diachronically motivated verb roots 
enables one to generate (surface) verb forms which have the same historical 
verb root. Secondly, employing “diachronic tags” allows for - and facilitates — 
interoperability with projects dealing with other historical Indo-European lan- 
guages, or, indeed, Proto-Indo-European.^? 

Example (20) illustrates the two-level encoding of the verb form as-oilgi 
*opens' (L = lexical level, S = Surface level), based on the derivation provided in 
Stifter (2006: 364). The lexical-level tag +PROCL_JUNCT denotes the separation 
between the proclitic(s) and the stressed part of the verbal complex. The upper- 
level tag W2a can be added to enable extraction of verbs with this specific stem 
type. Consecutively numbering the preverbs is also expected to facilitate in- 
depth linguistic analysis; for example, it allows for a systematic investigation 
of the positional hierarchy of preverbs. 


(17) L uss+PV1+PROCL_JUNCT+0d+PV2+léc+VROOT+W2a+PRES+IND+CONJ+ 
3P+SG 
S as-oilgi 


Even though Old Irish can be treated as a normative phase within the medieval 
period, the language is far from orthographically stable. In the current 


39 It should also be noted that headwords in eDIL are not consistently provided in a form rep- 
resentative of Old Irish. An example is classical Old Irish deponent molaithir ‘praises’, which is 
represented by “generic” Early Irish molaid in the dictionary. (a new entry, albeit solely con- 
taining a reference to molaid, has been introduced in the revised 2019 version of the dictio- 
nary; s.v. molaithir or dil.ie/50393). 

40 See, for example, Proto-Indo-European Lexicon, a generative etymological dictionary of 
Indo-European languages, also implemented with the finite-state toolkit foma. Available at: 


cip] pielexicon-Tu-helstmErTiffaccessed 7 February 2019]. 
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implementation, the surface or lower level adheres, as closely as possible, to clas- 
sical Old Irish grammar and orthography. Orthographical variation in Early Irish 
texts is expected to be successfully handled by one of Dereza’s (2018) lemmatiser 
implementations (see 5.4) used as a standardiser (see 7.4). 


7.2.2 Monolithic stems 


Section 3 has illustrated that a significant amount of allomorphic variation can 
be seen with verb stem formation, with syncope often causing truncation of the 
verb root, as in (17) above. This variation is challenging for a finite-state rule- 
based system, in which one typically starts off with a list of stems and affixation 
rules. Recall the morphotactics of the verbal complex (see example 7 in 3.1), re- 
peated here as (18). 


(18) (C) PV* (AUG) PV* VROOT E 


If we blindly applied the morphological concatenations without regard to pho- 
nology, we would get, for instance, ní-to-ro-lec- (C-PV1-AUG-VROOT), where the 
morphological derivation is quite far removed from the surface or orthographical 
form ni-tarlaic (see [viii.] in Table 2 on page 55, and 3.4). Employing the above 
concatenation schema to model Old Irish verb morphology was therefore not 
considered a feasible starting point — even when equipped with knowledge 
about the positional hierarchy of preverbs (McCone 1997: 89-90). 

Allomorphic alternations are essentially a product of the morphology- 
phonology interface in Old Irish, as has been demonstrated in section 3. In other 
words, “unpredictable” stem formation is largely due to stress patterns, includ- 
ing syncope. Looking at Table 2 in section 3.1, the examples that do not show 
allomorphic stem variation are simple verb forms (i.)—(iii.) and the deuterotonic 
compound with one preverb in (v.), exactly those forms that have a stressed verb 
root. In all the other examples given in Table 2, where the root is unstressed ([iv.] 
and [vi.]-[viii.]), stem formation is less trivial, at least from a computational view- 
point and if operating with a set of clear-cut, synchronic rules. 

The opposition of stressed versus unstressed verb root was found to be of 
major significance in formulating verb stems.^' An additional base form is re- 
quired for any combination of a preverb or augment and an unstressed verb 


41 Note that this distinction does not fully coincide with the traditional binary oppositions of 
simple versus compound and independent versus dependent. 
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root. A simple weak verb often requires an additional entry for the dependent 
augmented form, such as reilic in the case of léicid. A weak compound verb re- 
quires more stems; do-léici, for example, can be said to have four stems: (do) 
léic, (do)reilic, teilic and tarlaic (see Table 2). While stems of weak verbs are 
generally unmodified in the different tenses/moods, strong verbs may show 
root-internal stem modifications in each of the five tense/mood stems; in other 
words, the above-mentioned numbers for weak verbs should be multiplied by a 
factor of up to five for strong verbs. ^? 

Stem entries such as reilic, teilic and tarlaic are called *monolithic stems" 
in the author's computational framework. These bases represent synchronically 
motivated multi-morpheme strings not trivially segmentable on the surface. 
Accordingly, they are not produced by diachronic phonological rules in the au- 
thor's FST rule framework, but keyed in as invariant stems in the lexc grammar. 
Monolithic stems subsequently enable the encoding of straightforward inflec- 
tional endings. While initially born out of programming considerations, the 
concept of a monolithic stem is perhaps also theoretically insightful. When 
these bases have been determined and encoded for a large amount of verb lem- 
mas, the minimum or average amount of stems necessary for operating with 
simple morphological rules can be calculated, which could be an interesting 
linguistic diagnostic for the level of complexity of the Old Irish verbal system. ^^ 

The formulation of monolithic stems partly remedies the problem of synchron- 
ically opaque stem formation and alternation. However, dealing with syncope re- 
mains a complicated aspect in the implementation. For example, syncope may 
cause secondary palatalisation/non-palatalisation. Consider the independent (deu- 
terotonic) and dependent (prototonic) present indicative first person plural forms 


42 The concatenation of preverbal proclitics results in less allomorphic variation and can be 
modelled using separately defined surface morphemes (more on the programmatic treatment 
of proclitics versus (stressed) stems in 7.2.3). 

43 But see the discussion on complications associated with syncope below. 

44 I am thankful to Prof. David Stifter for bringing this additional insight to my attention. One 
could ask the question how the complexity in Old Irish verb stem formation compares to other 
languages. Such a cross-linguistic examination is, unfortunately, outside the scope of this 
paper. It is not unlikely, however, that the perceived complexity of the verbal system is at least 
in part due to the absence of a comprehensive synchronic description of Old Irish, and, conse- 
quently, a framework employing transparent morphological rules. So far, scholars of Old Irish 
have been mainly relying on historically oriented grammars such as GOI. A related issue, not 
unique to Old Irish, is the fact that the description of a historical language is typically based on 
a closed and often relatively small corpus; many forms across inflectional paradigms are not at- 
tested, which may impede a full synchronic description of the morphological rules at play. 
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of the strong verb as-beir (ess-ber-) ‘says’ in (19) and (20), respectively. The root 
vowel e in ber has been subject to syncope in (20). The e in ending -em in (20), as 
opposed to -am in (19),? marks subsequent secondary palatalisation of the conso- 
nant cluster -pr- (/b’r'/). Example (21) is the dependent prototonic equivalent of 
as-beir, with palatalisation of root-final r (/eb’ar'/) to mark the personal ending. 
This form is not liable to syncope as it only consists of two syllables. The mecha- 
nisms behind stem and ending variation of this kind occur throughout paradigms 
of compound (and augmented simple) verb formations. 


(19) as-beram 
PV-Sayip pues 
*we say' 


(20) ni-eprem 
NEG-:Sayip. purs 
*we do not say' 

(21) ni-epir 
NEG-say3sc.pres 
‘does not say’ 


In the current implementation, syncope is incorporated in the framework of 
regular expression rules; a conditional rewrite rule targets vowels in even- 
numbered syllables (but not final ones), which are liable to syncope. A mono- 
lithic stem such as tarlaic should therefore be encoded as tarolaic (even though 
this form never surfaces) to make sure vowel syncope is correctly applied in 
subsequent even-numbered syllables. Monolithic stems are therefore perhaps 
best described as semi-surface forms. 

Unavoidably, “mechanical” treatment of syncope results in cases where the 
resulting consonant cluster violates the phonotactics of Old Irish. While this 
can be (and partly has been) counteracted by changing the conditional rewrite 
rule, irregularly applied syncope is very hard to cater for. For example, the aug- 
mented preterite third person plural surface form reilciset (-reil'ciset) generated 
by the FST does not match attested -rel’c’set,“° with syncope of the vowel in 
the second as well as third syllable (the difference between the vowel sequence 


45 Apart from the different quality of the preceding consonant(s), both endings represent /o/. 
46 This form is cited and discussed, alongside other examples of compounds with root léc, by 
Ó Crualaoich (1999: 97-98). 
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e and ei in the first syllable is purely orthographical). Perhaps we should oper- 
ate with the stem reilc instead of reilic to arrive at what would then be regular 
syncope of the vowel in the second syllable; however, the question in that case 
is how to derive forms without syncope, such as expected and attested” preter- 
ite third person singular -reilic. Intra-paradigmatic analogy further complicates 
a rule-based approach to syncope, as can be illustrated with dependent present 
third person plural passive -epertar (expected *-ep'retar) of as-beir ‘says’ (ess- 
ber-), modelled on the present third person singular passive -eperr.^? 

The complexities relative to syncope and analogy (operating both within and 
across paradigms) raise the question whether rule-based stem-and-ending gener- 
ation using monolithic stems is invariably more economical than manually en- 
coding (“hard-coding”) an entire verb paradigm. Strong verbs such as beirid and 
as-beir are very frequent, and therefore more liable to irregularity and analogical 
processes. For many other verbs, the distinction between “regular” and “irregu- 
lar” (or, perhaps better, “predictable” versus “unpredictable”) is not as clear-cut, 
which deters deciding a priori whether an automatic generation or manual en- 
coding approach is most feasible. Establishing a good balance between auto- 
matic and manual methods (based on expert knowledge) is further complicated 
by the fact that no exhaustive list of Old Irish verbs or verb roots exists — let 
alone a comprehensive overview of stem classification and stem formation pro- 
cesses that could inform the formulation of monolithic stems. ^? 

In the author's project, the focus is initially on weak verbs; compared to the 
group of strong verbs, weak verbs show transparent tense/mood stem formation 


47 eDIL s.v. léicid or dil.ie/29766. 

48 For an overview of the entire inflectional paradigm of as-beir see Strachan (1929: 68-71). 
49 Another problem is that some works deal with roots, and others with lemmas. Pedersen 
(1909—1913, vol. 2) lists 204 roots (based on the dedicated number of paragraphs, 650—854). 
However, the focus is on primary verbs (mainly verbs with Proto-Indo-European roots), which 
are mainly strong verbs. A more up-to-date work on primary verbs is Schumacher (2004), who 
lists 197 reconstructed Celtic verb roots, 166 of which are found in Irish verbs. However, this 
work excludes causatives. Le Mair (2011) discusses weak verbs in the Old Irish glosses, giving 
a total number of 365. McCone (1997) lists a good number of inflections in his index verborum 
but does not include stem class and mainly considered material from the Old Irish glosses. 
The online eDIL contains 4,127 verb headwords but does not systematically provide a stem 
classification. This number includes duplicates as some verbs have a separate Old and Middle 
Irish headword. Moreover, some (e)DIL headwords are more indicative of Middle Irish than 
Old Irish (e.g. molaid, rather than Old Irish deponent molaithir ‘praises’). Rossiter (2004) ap- 
plied McCone's stem classification (the one adhered to in the present work) to verbs in DIL, 
but only dealt with compound verbs. The vocabulary section in Stifter (2006) is far from ex- 
haustive but does systematically provide the stem class and roots for verbs. 
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by means of suffixation only (see 3.3); in other words, one does not have to cater 
for non-trivial stem-internal (non-concatenative) modifications based on an ab- 
stract root. However, as the above has shown, most verbs need more than one 
monolithic stem regardless if we want to cater for augmented simple verbs and 
compound verbs. 


7.2.3 “Word” segmentation and separated dependencies 


Morphological parsing operates on the word level, and words are defined as 
strings surrounded by space. A form such as beirid, with the ambiguous ending 
-id, will receive three grammatical analyses during morphological parsing, as it 
may occur in three grammatical contexts, illustrated in (22)- (24). The presence 
or absence of a conjunct particle (here negative or negative imperative), if sepa- 
rated by space, is a disambiguating feature in the subsequent task of building a 
POS tagger (not part of the present paper), which operates beyond the word 
level, even if merely typographical. 


(22) beirid 


CalTY3sc.pres 
‘carries’ 


(23) ní beirid 
NEG- ఆ... 
‘you do not carry’ 


(24) (ná) beirid (. . .)! 
(NEG-) carryzpr.mrv 
‘(do not) carry (. . .)! 


Morphological boundary markers, including spaces, are absent in faithfully 
transcribed texts and diplomatic editions. More commonly, text editions (which 
might make their way into a digital corpus) are subject to editorial choices and 
policy, according to which typographical morpheme boundary markers might 
be employed. The current finite-state implementation anticipates instances of a 
potentially highly synthetic verbal complex written as one *word" (consecutive 
string), optionally with a mid-high dot (for the proclitic juncture) or hyphens; it 
accepts forms of the type nondobmolorsa, but also, for instance, nondob-molor- 
sa (example (16) discussed in section 3.2) (a different yet interesting approach 
focused on pre-processing is provided in Doyle, McCrae, and Downey (2019), 
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who explore the possibilities of automatic tokenisation for Old Irish using a neu- 
ral-network-based approach.). This choice in the implementation facilitates rec- 
ognition but obviously also results in a vast amount of combinations to be 
considered of which only a limited amount are morphotactically valid. The re- 
strictions are generally separated dependencies (co-occurrence of non-consecutive 
morphemes) and most of these have been successfully encoded. The generation of 
exclusively morphotactically valid forms prevents wrong parses due to ambiguity 
at the surface level (e.g. identical absolute and conjunct endings). 

The interaction between monolithic stems (see 7.2.2) and separated depen- 
dencies is illustrated in Table 3, using the same verbs as in Table 2”) By their 
very nature, simplexes such as léicid cannot be preceded by a lexical preverb. 
Compound stems of the type teilic or tarlaic preceded by the proclitic augment 
ro or the proclitic preverb do are impossible as one or both of these elements 
are already incorporated in the (monolithic) verb stem.” However, ni (and con- 
junct particles in general) can precede any stem except deuterotonic stems. 

The author's lexc implementation contains separate lexicons for proclitics (pre- 
verbs and particles, optionally with infixed pronoun and relative marker) and verb 
stems with endings, which may occur typographically as strings separated by 
space, and are recognised as such. The lexicons can be optionally concatenated. 
In the author’s implementation, separated dependencies are partly encoded with 
“flag diacritics” (Beesley and Karttunen 2003: 339—373), special regular expression 
symbols accompanying morphemes (lexc entries) that either allow or block paths 
in the network. Flag diacritics are not visible during analysis/generation, apply at 
run-time, and can, together with the blocked morphotactically illegal strings, be 
deleted from the network. For example, if do is marked as “preverb do seen”, and 
prototonic teilc as “preverb disallowed”, we will never get, for instance, *do-teilic. 
A simple verb such as marbaid ‘kills’, also accompanied by “preverb disallowed”, 
will equally never be prefixed with do (or any other preverb). Deuterotonic stems, 


50 It should be noted that there might be overlap in monolithic stems across verb lemmas (e.g. 
reilic). In the current implementation, formulation of monolithic stems is on a per-verb (lemma) 
basis. An approach whereby monolithic stems are used for multiple lemmas, while not impossi- 
ble, fails to make a distinction between, for instance, simple and compound verbs, which are sub- 
ject to different constraints. “Recycling” monolithic compound stems might be of use, however, 
with verbs liable to preverb alternation (e.g. in-fét, ad-fét ‘relates’), secondary composition (Stifter 
2006: 254) and the employment of “dummy” preverbs in Middle Irish (McCone 1997: 194—197). 

51 ro-teilic and niro-teilic for do-reilic and ni-tarlaic, respectively, reflect a Middle Irish develop- 
ment whereby ro gradually assumes the status of conjunct particle (see 3.1 and Breatnach 1994a: 
279). The prefixation or infixation of proclitic ro with prototonic stems is blocked in the current 
version of the FST; systematic encoding of Middle Irish features (such as the relaxation of gram- 
matical rules relating to proclitic ro) is envisaged as a subsequent adaptation stage of the FST. 
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Table 3: Schematic overview of separated dependencies with a selection of proclitics and 
monolithic stems in Old Irish, exemplified with léicid and do-léici. 


LEXICON 1 LEXICON 2 


Proclitic (“prefix”) Monolithic stem Ending 


Conj. Pre- Aug- Optional infix 
part. verb ment 


Simple/ Deutero- 
proto- tonic 


rel. marker and/or i 
tonic 


pronoun class 


léicid A or rel.(+C) 
(simplex) 
teilic- conj. 
do-léici tarolaic- 
(com- 
pound) 


A or rel.(+C) 


however, have a flag diacritic of the type “requires preverb X", which only allows 
the specified preverb, and nothing else, not even a proclitic that is not a preverb. 
Prototonic/simple stems, on the other hand, allow anything but a proclitic preverb, 
and are (mostly correctly) preceded by proclitics that are not preverbs (e.g. ni). 

Flag diacritics have proven to be convenient for the encoding of (essentially 
arbitrary) combinations of proclitic preverb and deuterotonic stem. The disad- 
vantage of flag diacritics, from a programmatic point of view, is the fact that 
one needs to think carefully about separated dependencies in advance when 
laying down the morphological concatenation architecture. Consequently, 
*adding Flag Diacritics post hoc to an existing system can require non-trivial re- 
editing of your source files" (Beesley and Karttunen 2003: 340). A sometimes 
more convenient way of restricting the generation of ill-formed words is the use 
of upper-level filters (Beesley and Karttunen 2003: 247—255), i.e. specifying in- 
compatible upper-level tags for an initially over-generating lexc grammar, and 
filtering all the illegally formed strings out of the network. 
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7.3 Preliminary test results 


The morphological FST was tested on the Old Irish text Táin Bó Fraích [Cattle- 
raid of Fróech], using the digital version available on CELT,” taken from the edi- 
tion by Meid (1974). The FST was augmented by personal names occurring in the 
story, a limited set of function words, and the extremely frequent defective verb 
ol ‘said’. It turned out that 9.6% of word types (unique forms) were morphologi- 
cally parsed, with an average, comparable score of 10% for four other Early Irish 
narrative texts edited by Greene (1955).” While the consistency of these scores is 
a promising result, the main goal of this exercise was to see how the FST would 
cope with weak verb inflection, which was concentrated on during implementa- 
tion. It should be noted that weak verbs were found to be rather infrequent; in 
terms of tokens, W1 and W2a verbs constitute only 8.396 of the total amount of 
verb forms (excluding verbal nouns) in Táin Bo Fraich.** 

Out of the 50 W1 and W2a inflected forms (types) in Táin Bó Fraích, 36 
(7296) were found to be correctly parsed. Most of the 14 non-recognised forms 
either deviate from a "canonical" spelling or show idiosyncratic features that 
are difficult to capture in general rules. Two verb forms in Táin Bó Fraích show 
grammatical variation that perhaps legitimises a rule; present subjunctive third 
person singular forruma (fo-ruimi, fo-rumai “puts”)” and preterite passive third 
person singular relative arrálad?* (ar-áili ‘arranges’) show fluctuation in stem- 
final consonant quality, which is a feature of W2 verbs (McCone 1997: 27—28). 

Spelling variation in and across texts such as the ones considered here are 
often seemingly trivial, as can be illustrated with verbs with stem léic-, which, 
first of all, may equally be spelled léc-. Another feature with the no apparent 
grammatical implication is the occurrence of the digraph ll in all three instances 


52 Available at: [accessed 7 February 2019]. 

53 Also available on CELT. The individual stories are Fingal Rónáin (available at: 
[ucc-Te/7pubIished/G302011/], Orgain Denna Rig (available at: 
[5302012/), Esnada Tige Buchet (available at: fhttps://celt-ucc.ie//published/G302013/) and 
Orgguin trí mac Diarmata Mic Cerbaill (available at: fhttps://celt.ucc.ie//published/G100037/) 
[all accessed 7 February 2019]. 

54 Middle Irish simple weak formations from original compound verbs were excluded from 
this count, as the FST does not (yet) deal with Middle Irish, e.g. fácbaid and oslaigid, from Old 
Irish fo-ácaib ‘leaves’ and as-oilgi ‘opens’, respectively. 

55 The headword fo-ruimi is given in Meid (1974) and eDIL (s.v. fo-ruimi, -fuirmi or dil.ie/ 
24043, under which attested third person singular present indicative forrumai is listed). 

56 The nasalising relative marker appears on the final consonant of the preverb (-rr), rather 
than on ál- (nál-), as expected. We would also expect the preverb to appear as ara-. Variation 
of this type would have prohibited successful analysis by the FST in the first place. 
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of sentence-initial present indicative third person singular do-léici in Táin Bó 
Fraích, for what is more commonly a single l in this context. Frequent alterna- 
tions of this kind beg the question if - and to what degree — spelling variation 
should be encoded as a final module in the FST framework. This and related pos- 
sibilities will be briefly discussed in the next subsection. 


7.4 Standardisation for Early Irish 


Beesley and Karttunen (2003: 287-293) recommend building different trans- 
ducers for different tasks so as to make the parsing pipeline as modular and 
flexible as possible. Such a pipeline typically includes a “standard” FST, where 
the surface level represents normative grammatical or orthographical forms. A 
separate transducer could be devised which can be secondarily invoked when a 
form does not conform to a standard grammar or spelling. The latter may take 
care of variation in spelling, as with, for instance, lécid for “standard” léicid. 

The lemmatisation tools being developed by Dereza (2018) are also expected 
to be of benefit for standardisation purposes. As stated in 5.4, Dereza (2018) has 
implemented an Early Irish Lemmatiser using two approaches: one method is 
based on approximate matching using string similarity, the other uses neural ma- 
chine learning. The first implementation predicts a mapping between an un- 
known inflected form in a text to a known variant, based on a dictionary of form: 
lemma mappings originally retrieved from eDIL. The second, more start-of-the-art 
and better performing version employs the latter mappings to learn character se- 
quences in order to produce lemmas it has never seen. The morphological FST 
for Old Irish, currently being developed by the author, will, in time, surpass the 
amount of inflected verb forms listed in the Dereza’s Lemmatiser dictionary. 
Moreover, the inflected forms generated by the FST adhere to a large extent to 
classical Old Irish inflection and spelling. By adding these canonical or “stan- 
dard” Old Irish forms to the known mappings in the Lemmatiser’s dictionary, we 
not only increase the power of the Lemmatiser enormously, but we can also use 
this resource as a spelling standardiser, namely, by mapping an unknown vari- 
ant in a text to a “standard” form from the FST, and retrieve the morphological 
parse of the latter. Lemmatisation and standardisation methods - in conjunction 
with the author's morphological FST — have only been tested to a very limited 
extent. 

The author has taken the liberty to use the term “standardisation” in his 
framework (see section 6), to show the similarity with the approach taken in the 
Foclóir Stairiúil na Gaeilge project. The terms “canonicalisation” or “normalisation” 
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are perhaps more fitting terms as an absolute standard did not exist in Early Irish, 
at least not in orthographical terms, not even in the otherwise reasonably homoge- 
nous language of Old Irish. 


8 Suggestions for linking cognate verb forms 


The author's most fundamental envisaged approach is to create operability 
between his own Old Irish morphological FST and the one for Modern Irish 
(Ui Dhonnchadha and van Genabith 2006). Such an infrastructure could in- 
corporate mappings between lexical-level tags, i.e. between Old Irish preverbs 
and verb roots and modern verb lemmas of the type léc+VROOT:lig+Verb+VTI 
and to+PV1+léc+VROOT:teilg+Verb+VTI. Additional tag mappings between in- 
flectional categories could be devised, with the provision that there is often a 
discrepancy between Old and Modern Irish. For example, there is no straight- 
forward modern grammatical category that matches the Old Irish augment. 
Although the modern past tense in many cases etymologically derives from a 
perfect construction with the preverbal particle do (for earlier ro, the resulta- 
tive augment), it does not inherit either “perfectivity” or “perfect” as a gram- 
matical feature. 

Nonetheless, tag mappings of this kind facilitate juxtaposition of Old and 
Modern Irish paradigms, facilitating research into historical roots and grammatical 
developments such as innovatory processes in stem and ending formation. The 
historical connection between lemmas such as lig ‘let’ and teilg ‘cast’ is not present 
in the modern-language morphological FST. However, this connection can be es- 
tablished by means of lec- VROOT, the “common denominator” for all verbs with 
this root in Old Irish, for which individual paradigms can be generated. The *mod- 
ern” analysis additionally tells us that both lig and teilg can be used transitively 
(+VTI), a feature which is expected, in many cases, to transfer back to Old Irish. 

Another "linking route" is through lemmatisation using droichead (Scannell 
2018), a digitised version of the mappings between standardised contempo- 
rary Modern Irish lemmas from Foclóir Gaeilge-Béarla “Irish-English dictionary’ 
(Ó Dónaill 1977) and eDIL headwords, originally prepared by de Bhaldraithe 
(1981). Scannell (2018) added POS tags and used the imperative second person sin- 
gular (as in Ó Dónaill 1977) as the modern lemma rather than the third person pres- 
ent indicative (matching the eDIL headword) in the original list. Since the FST for 
Modern Irish (Ui Dhonnchadha and van Genabith 2006) employs the lemmas in 
Ó Dónaill (1977) on the lexical/upper level, and droichead provides the corre- 
sponding eDIL headword, mappings between any modern standard inflected 
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form (as well as many pre-standard forms, see 5.4) and the Early Irish eDIL 
headword can be facilitated. Mappings between eDIL and (increasingly earlier) 
Modern Irish inflected forms or headwords would be of great benefit to schol- 
ars working on texts produced at various stages during the medieval period, 
who are currently confronted with a vast range of grammatical and ortho- 
graphical variants while operating with limited lexicographical resources, es- 
pecially for Early Modern Irish. 


9 Synthesis and future work 


The aim of the work is to link up lexical resources for the Early and the Modern 
Irish period. This chapter has identified a lack of digital linguistic resources for 
the historical Irish period, with a fragmentation and discontinuity in terms of 
lexicographical support, which makes the aim of the research, the interlinking 
of cognate verb forms, a far from straightforward process. This challenge is 
compounded by significant linguistic developments between Old and Modern 
Irish, mainly in the verbal system, and especially in Middle Irish. 

The computational methodology proposed employs two finite-state trans- 
ducers (FSTs) at the opposite end of the historical spectrum - Old Irish and con- 
temporary standardised Irish - as these language stages represent normative 
and/or standardised varieties and are well resourced. Advanced methods are 
being employed in the context of the Royal Irish Academy's Foclóir Stairiúil na 
Gaeilge, using a standardiser in conjunction with a modern-language POS tag- 
ger (based on an FST), greatly increasing recognition of increasingly earlier his- 
torical variants and connecting the latter with the modern lemma. 

The current focus in the author's project is on creating a morphological FST 
(and, subsequently, POS tagging tools) for Old Irish using the software foma 
(Hulden 2009). The FST is planned to be used in conjunction with a lemmatiser 
for Early Irish based on eDIL (Dereza 2018), which could be employed to predict 
canonical Old Irish inflected forms generated by the Old Irish morphological 
FST for orthographical variants in Early Irish texts, as such functioning as a 
standardiser. 

The challenges relating to a rule-based FST include morphophonemically 
complex verb stem formation. Allomorphic stem variation and truncation of the 
verb root, especially prevalent with compound verbs, have been tackled compu- 
tationally by devising multi-morpheme, non-derived units called “monolithic 
stems” in the author’s work; these bases consist of the verb root and, if present, 
preverbs and augment following the proclitic juncture. While the formulation of 
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monolithic stems is time- and knowledge-intensive, the resulting stem entries re- 
duce complex, non-concatenative stem and ending formation to relatively 
straightforward morphological rules, which can be largely automated. Separated 
dependencies have been successfully handled with instruments of the finite- 
state paradigm. 

Unpredictable inflectional patterns resulting from irregular syncope and 
analogy in inflectional patterns challenge a linguistically motivated, rule-based 
approach. A further issue is the absence of an exhaustive list of Old Irish verbs 
and information about stem type and stem formation. These conditions make it 
difficult to exactly establish the balance between automatic methods and man- 
ual efforts and expert knowledge needed. However, the concept of a monolithic 
stem strikes an interesting balance between “automatic” and “manual” and 
may well be a leap forward in establishing this balance. The incorporation of 
lexical resources such as the databases produced as part of the Chronologicon 
Hibernicum project and the lemmatised verb tables as part of In Duil Bélrai will 
likely speed up the development of the author’s FST. 

Test results are promising but incorporation of more verbs and verb classes 
as well as catering for inflectional variation and non-standard forms is an im- 
portant prerequisite in the context of establishing the feasibility of the imple- 
mentational choices and further development of the FST for Old Irish. 

Linking cognate verb forms across the entire historical period is very much 
future work. However, two methods have been proposed in this work. The first 
one involves mappings on the lexical level of the FSTs for Old and Modern 
Irish, facilitating the juxtaposition of entire historical paradigms based on Old 
Irish roots as well as systematic investigation of linguistic change. Alternatively, 
mappings between eDIL headwords and modern lemmas from O Dénaill (1977) 
can be established by integrating the tagger (and standardisation tools, Scannell 
2009, 2017) for Modern Irish (Ui Dhonnchadha and van Genabith 2006) and the 
mappings as part of droichead (Scannell 2018). 

Standardisation methods in conjunction with the Old Irish and Modern Irish 
morphological analysis/tagging tools will result in increasingly better coverage 
rates of intermediate variants. With the modern-language tagger “stretching 
back” and the one for Old Irish “reaching forward” we can metaphorically de- 
scribe the adaptation process as a “two-pronged attack”. It should be stressed 
that, in catering for variation throughout the medieval period, adaptation pro- 
cesses are likely to move beyond the realm of orthography. 

The substantial linguistic variation and change seen in the Middle Irish pe- 
riod in particular will be an interesting challenge for either the “old” and “mod- 
ern” FST/tagger. Dereza’s (2018) Early Irish Lemmatiser will definitely have a 
role to play here, as it incorporates Middle Irish inflections given in eDIL; in 
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other words, we will (hopefully) arrive at the Early Irish eDIL headword. 
Adaptation of the Old Irish morphological FST to deal with “Middle Irishisms” 
is also a possibility. To properly deal with the verbal system of Middle Irish, a 
list of univerbated verb stems is necessary. Alternatively, or as a complemen- 
tary approach, a list of prototonic compound stems from the Old Irish morpho- 
logical FST can be extracted and combined with weak simple verb inflection. 
The latter will result in the generation of many non-existing or unattested 
*new" simple verb formations. Overgeneration, however, is not a problem from 
an analysis perspective (the grammatical analysis of an unattested form will 
never come up) and will enhance recognition scores. 

Further possibilities include incorporating tags for language variety or lin- 
guistic features (such as for Middle Irish) on the lexical level of the (adapted 
version of the) Old Irish FST. Encoding this information will provide us with a 
way to augment morphological analysis with automatic textual dating. An ap- 
plication could be to establish the proportion of Middle Irish, as opposed to Old 
Irish, forms in an Early Irish text. 

A more distant research prospect is the integration of POS taggers, data- 
bases, corpora and dictionaries into one lexical resource. Such a resource will 
hugely benefit scholars operating at the intersection of the Early and Modern 
Irish period, who now rely mainly on eDIL and Dinneen (1927), with no lexico- 
graphical facility that comprehensively spans the entire historical period. The 
author hopes to establish academic collaborations in the future to get a better 
grip on both the computational and linguistic challenges of his project. 
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Christopher Guy Yocum 
4 Text clustering and methods in the Book 
of Leinster 


Most investigations of the Book of Leinster (hereafter LL) have used close reading, 
historical, and philological techniques to identify authors within LL (for instance, 
see Mac Gearailt 1993; Bhreathnach 2002; Mac Gearailt 1997-1998; O Lochlainn 
1941-1942; O Lochlainn 1943-1944; Mac Eoin 1982: 113-114). While this has met 
with some success, the methods used are by their nature idiosyncratic and prone 
to individual scholarly opinion. One notable exception is Derick Thomson’s paper 
The Poetry of Niall MacMhuirich which attempts to use statistical methods to attri- 
bute authorship of poems to Niall MacMhuirich (Thomson 1970). This paper will 
use methods of anonymous authorship attribution, which has been developed 
within the discipline of machine learning and statistical analysis to accomplish 
two goals: first, to demonstrate the means and methods of unsupervised machine 
learning techniques in early Irish literature and second, to discuss the implications 
of the application of this methodology to LL with a view towards a larger research 
project. 

The paper will proceed in four stages. First, some scholarly literature concern- 
ing LL is reviewed. Second, the methods of data gathering, along with certain re- 
lated problems, as well as the algorithms used in the analysis are commented 
upon. Third, the outcome of the analysis is summarised. Fourth, the paper con- 
cludes with an examination of the contribution the analysis makes to the debate 
surrounding the authorship of LL. 


1 The context of LL 


LL, next to Lebor na hUidre (hereafter, LU, Best and Bergin 1929), is one of the 
great monuments of Irish literary culture, which was written between 1151 to 
around 1201 (Schlüter 2010: 24; see also Duncan 2012: 45-56). Most of the manu- 
script is in the hand of Áed mac Crimthainn (usually cited as A); however, there 
are five other discernible hands: F, T1i-4, M, U, and S (Duncan 2012; see also 
Schlüter 2010: 27). Overall, there are 164 texts, which have 189,472 words in total. 
The provenance of the manuscript is the subject of much debate. According to 
Schlüter, LL was the product of the monastery at Cloneagh in Leinster and it may 
have been moved to Náachongbáil for safety during the wars of the 12th century, 
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where it gained its medieval name. In addition, it was probably written for the 
Loigis and celebrates their ancestor the Ulsterman Conall Cernach, which accounts 
for much of the Ulster material in an otherwise Leinster book (Schliiter 2010: 
30-35; see also Duncan 2012: 45-49). 

A modern diplomatic edition of the entire manuscript was produced by 
Best, Bergin, and others (Best et al. 1954-1983). Moreover, many of the texts 
found in the manuscript have been published separately as critical editions (for 
instance, see O’Rahilly 1967). Currently, the entirety of the LL diplomatic edi- 
tion is available in TEI (TEI Consortium 2009) XML format at CELT. The CELT 
version of the diplomatic edition of LL is the basis for all work considered in 
this chapter. 

In an attempt to ascribe authorship to scribes of the versions of Táin Bó 
Cualnge, Cath Ruis na Rig, and Mesca Ulad found in LL, Mac Gearailt breaks up 
Táin Bó Cualnge into “regions”, then performs some statistical analysis on the or- 
thography and language of each region (Mac Gearailt 1993: 172-178). He concludes 
that scribes A and T intervened in the text and made their own contribution (Mac 
Gearailt 1993: 205). In a later study, Mac Gearailt (1997-1998: 405) attempts to date 
the Tain Bó Cúalnge by counting and cataloguing infixed pronoun usage and style 
and concludes that: 


Finally, it may be noted that while non-historical infixed forms in the LL Tain can be as- 
signed to the period when CRR-LL was composed, many others which conform fully to 
Olr. or Mid. Ir. rules are survivals from a much earlier stage of Recension II... 


Mac Eoin (1982) discusses authorship in terms of unreliability of dating. In this 
he laments the situation but holds out hope that the separation of prose and 
poetry could assist in distinguishing between the original and any additions: 


But how are we to judge the validity of attributions to poets who fall within the Middle 
Irish period? It is often assumed that ascriptions to Middle Irish poets in Middle Irish 
manuscripts like LU, LL, and Rawl. are reliable. Some would certainly seem trustworthy, 
but their reliability is not enhanced by ascriptions to Cormac macc Airt, Medb Lethderg, 
and Ailill Oluimm on the adjoining pages. (Mac Eoin 1982: 124) 


Ó Lochlainn (1941-1942) and (1943-1944) attempts to use textual sources to se- 
cure attribution of authorship to poems traditionally ascribed to Mac Coise. This 
is done by using various features of Middle Irish to date the poem and the date of 
Mac Coise’s death in the early Irish annal to demonstrate that Mac Coise could 
not have written the poems under consideration. This position was re-evaluated 
by O’Leary (1999) who argued that there were three different people named Mac 
Coise whose poems can be securely ascribed: Airbertach mac Cosse Dobrain, 
Iorard mac Coisi, and mac Coisi (O'Leary 1999: 69-71). 
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As the examples above show, with few exceptions the main method of 
modern scholarly authorship attribution uses stylistic dating and dating with 
reference to annals or other sources of textual evidence to assign or at least 
question the authorship attributions made by the scribes of various early Irish 
manuscripts. 

With this in mind, the next step is to begin with the mechanics of how to 
use unsupervised machine learning techniques on early Irish literature using 
LL as an example. 


2 LL as a list of vectors 


While the foregoing has set the scene for the current state of the scholarly de- 
bate, this section will explore the various methods of data gathering and analy- 
sis. This section is by necessity highly technical in nature and will require close 
attention to the means by which a written text can be transformed into a math- 
ematical object. 

Many of the texts which appear in LL, (e.g. Lebor Gabála), also appear in 
other Irish manuscripts, which would suggest that other manuscript versions 
should also be included in the analysis. While this is an area for future research, at 
the moment for the ease of analysis and modelling, other manuscript versions 
which are available on the CELT website have not been included. Thus, LL and its 
texts are the only ones under analysis in this paper. 


2.1 Dividing LL into texts 


There are several legitimate ways of viewing LL: as an indivisible complete work, 
or as the work of a group of scribes, or as a conglomeration of separate texts, or 
as the six-volume set as prepared by Best et al. (1954-1983). If LL is viewed in the 
first way, the analysis in this paper would not be possible as there would only be 
one text to analyse and the method proposed would not work. If LL is taken as 
the second, texts would need to be split by hand rather than by title. A slightly 
modified form of this analysis is attempted in the course of this chapter. If LL is 
viewed as the modern six-volume set, it would, much like the indivisible com- 
plete work, contain too few texts to analyse using the proposed method. Thus, 
for the purposes of this paper, LL will be viewed as the third type, a collection of 
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separate texts collected into one whole work by a group of scribes. This policy 
accords well with Duncan’s (2012: 28) argument on the composition of LL: 


To regard Lebor na Nuchongbala as a single manuscript does not allow for its complexity 
either physically, palaeographically, or textually, and gives the impression that it was 
written at the same time and place in a straight run. 


The CELT XML edition is composed of TEI XML files which correspond to each 
published volume of Best et al. (1954-1983). Within these files each text is di- 
vided by a «div1» XML tag. Each of these were extracted by use of two XSLT docu- 
ments in succession using the SAXON XSLT engine. From there, a further use of 
XPath is made to extract the textual information and place it in files given names 
based on the «div1» tags which eased further analysis. All computer code to ac- 
complish this is available at 
Not all texts as extracted by the method above were used in the subsequent 

analysis. The texts that were not included, are: 

- Haec sunt nomina virorum componentium lapides 

- Lebor Gabála Érenn 

— Togáil Troi 

- Prose Dindsenchas 

- Metrical Dind$enchas 

- AllGenealogies 

- All King Lists 


As the reader will undoubtedly notice, most of the texts are king lists or other 
kinds of lists. These texts do not give enough of the kinds of information necessary 
to create a good representation of the texts that are of interest to this kind of analy- 
sis. In addition, as will be shown below, they may distort the outcome of this exer- 
cise. Moreover, Lebor Gabála has been excluded because it has a complex textual 
history of its own and may confuse the analysis (Scowcroft 1987: 81-89). The con- 
sequence of this policy is that texts which are identified by author within Lebor 
Gabála are not included. For instance, a text of Flann Mainistrech’s poem Estid a 
eolchu cen ón and Gilla Cóemáin's Goedel Glas 6 tat Goídil and Tigernmas mac 
Follaig aird are not included in the analysis. As Peter Smith (2007: 27) states: *Both 
Goedel Glas 6 tat Goídil and Tigernmas mac Follaig aird appear to have formed an 
intrinsic part of Lebor Gabála since their composition." The inclusion of the texts 
with identified authors but considered within Lebor Gabála would distort the anal- 
ysis as the texts would be mixed with whomever transmitted the LL recension of 
Lebor Gabála. Similarly, Togáil Troí was excluded due to its own complicated tex- 
tual history as described in Mac Gearailt (2016). The exclusion of the Prose and 
Metrical DindSenchas is because these texts are similar in structure to Lebor 
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Gabála in that Prose and Metrical DindSenchas contain component texts. 
Additionally,the prose and metrical versions also appear together in most ver- 
sions of the text (Theuerkauf 2017: 49-50). Attempting to extract and examine 
these texts would overburden the methodology to the detriment of the illus- 
trative purposes of this chapter. A coherent method of extracting these kinds 
of texts from their surrounding textual context both in a single manuscript 
and in many versions across manuscripts deserves a far more thorough exam- 
ination than can be accomplished here. Having such a methodology would 
allow an unsupervised machine learning method to be applied while remain- 
ing faithful to their history and context. Additionally, there are three texts 
which were not included in the CELT XML of the Book of Leinster but were 
used in the analysis, namely: Táin Bó Culinge, Fingal Rónáin, and Esnada Tige 
Buchet. These were supplemented from additional CELT files. 


2.2 From texts to vectors 


There are two generally accepted forms of text tagging for anonymous author 
attribution (Juola 2008: 262-266). First is part-of-speech tagging (hereafter, 
POS). This form of analysis uses a set of tags which mark the text for parts of 
speech. Using a method called Maximum Entropy, other untagged texts of the 
same language can be POS tagged (Jaynes 1957a, 1957b). Middle Irish does not 
have a POS tagger at the moment. POS tagging without an automatic POS tag- 
ger is extremely time-consuming and would be impossible in this instance. 
Lash (2014a) has constructed a corpus of POS tagged texts in Old/Middle Irish, 
which could form the basis for a POS tagger in the future. Additionally, there is 
the new Corpus Palaehibernicum (CorPH) (see the introduction to this volume), 
which could also help in this regard. However, the accuracy of automatic POS 
tagging can cause errors in itself: 


. . . especially for POS taggers, is the introduction of errors in the processing itself; a system 
that cannot distinguish between contraction apostrophes and closing single quotes or that 
can only tag with 9596 accuracy will conflate entirely different syntactic constructs, muddy- 
ing the inferential waters. (Juola 2008: 265) 


The second type is function word tagging. Famously, function word tagging 
was used in identifying the authorship of the Federalist Papers (Mosteller and 
Wallace 1963). The Federalist Paper were a set of anonymously written essays 
to promote the ratification of the Constitution of the United States of America. 
Function word tagging is less time-consuming and a proven way of identifying 
anonymous authorship and was therefore the chosen method for this exercise. 
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Function words are words which have no lexical meaning in a sentence and 
serve only to structure the sentence grammatically. In English, this includes words 
like of, and, a, an, etc. In early Irish, this includes but is not limited to pronouns si 
‘she’, prepositional pronouns duit ‘to you’, conjunctions (7), and definite articles 
in, int ‘the’. Difficulties arise when infixed pronouns are encountered in the various 
texts. Infixed pronouns were left out of this analysis on the grounds that it would 
be difficult, but not impossible, to add them cleanly while not disturbing the crea- 
tion of the document vector, more on which below. 

In LL, there are 1,125 different categories of function words. The number of 
categories is large because there is no normalisation done during the counting of 
the words. This means variant spellings and initial mutations of words are left 
unnormalised. Thus n-uile ‘all’ is counted separately from plain uile ‘all’. The con- 
sequences of this choice will be explored later as it has bearing upon the mathe- 
matics involved. Of the 189,472 words, as mentioned above, there are a total of 
56,513 function words which means that there are an average of 344.59 function 
words per text. As there are so many function words, to list them here would be 
impractical so the total raw frequency can be found online.’ 

Once the tagging is finished, the tf*idf, which stands for term frequency times 
inverse document frequency, is calculated as shown below. Term frequency in the 
formula tf*idf means that the frequency of function words in a document is a 
major factor in determining the author of the document, which according to Zhong 
and Ghosh (2005) gives the best results for this kind of analysis. 

Let D be the set of all documents under consideration and N be the number 
of documents in the set. For LL, N= 164. The normalised frequency tf(t,d) of a 
term t in a document d is computed thus: 


f(t.d) 


tf(t, d) = max{f(w, d):w € d} 


In other words, the term frequency of a term in a document is the number of 
times that term appears in that document, denoted f (t, d), divided by the maxi- 
mum raw frequency of any term in that document, denoted max{f(w, d):w € d}. 
This includes non-functional terms (in other words, terms that have semantic 
meaning: nouns, verbs, etc.). 

The inverse document frequency, idf, is then computed thus: 


idf (t, D) = log al ) 


|{d e D:t € d}|+1 
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In other words, the logarithm of the number of documents in the corpus divided 
by the number of documents where the term t appears. It is common to adjust 
for the fact that term t may not appear and thus one is added to it, which avoids 
divide by zero situations. 

Finally, the tf*idf is calculated thus: 


tfidf (t, d, D) = tf(t, d) x idf (t, D) 


This is calculated for each possible function word in a text. The tf*idf is well- 
defined for all words in the corpus but only function words are of interest here. 
Not all function words occur in all texts; if a term t does not occur in a docu- 
ment d, then ff(t, d) is O and hence tfidf (t, d, D) is also 0. 

We pick an ordering, which, while random, must be fixed as discussed 
below, fo, ti, ...,t, of function words, and an ordering do, ...,dy-1 of docu- 
ments, and for each document d; we form a document vector: 


(Wij We ..., Wij) 


where wi; = tfidf (ti dj, D). The document vector is a list of the tf*idf values as 
defined above. 

The fixed ordering of words allows comparisons across documents. For in- 
stance, if 7 (ocus ‘and’) is first in the list, then the tf*idf for 7 would be the first 
component of any document vector. Putting this all together, LL gives rise to a 
list of vectors (or a list of lists) of tf*idf values for each document. A list of vec- 
tors is called a matrix. It is an interesting feature of the LL that this matrix is 
sparse, as many of the entries in the vectors are 0. 


2.3 From matrix to clusters: k-medoids 


Once the matrix of the tf*idf of each function word which appears in a particular 
text has been calculated, the entirety of LL is ready for the next stage in its trans- 
formation. There are many different means of taking the digitised corpus and de- 
termining the possible clusters. The most common of these and the one that will 
be used here is called k-medoids, which was first introduced in Kaufman and 
Rousseeuw (1987) (see also MacQueen 1967). Additionally, while there are nu- 
merous statistical packages available to complete the last leg of the journey from 
text to mathematical object, the technical computing programming language 
Julia was chosen to compute and graph the final results (Bezanson et al. 2012). 
The k-medoids algorithm uses a distance metric to partition the matrix (Park 
and Jun 2009). In this case, the cosine distance is used (see Tan, Steinbach, and 
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Kumar 2005: 500 and Singhal 2001). The optimal partition is then found and all 
texts are placed in the optimal clusters based on their cosine distance from each 
other and the possible number of clusters. In other words, texts are placed to- 
gether in one cluster when the algorithm determines that their vectors are close 
to each other. The output of the algorithm is called a clustering solution. The clus- 
tering solution must then be interpreted, which means that the results may dem- 
onstrate author attribution, genre, or scribal activity (Stamatatos 2009: 23). All 
these possibilities will be explored below. 

One drawback of the k-medoids algorithm is that it does not estimate the 
number of clusters and thus, the number of clusters must be supplied. There is 
research into estimating the number of clusters; however, the research has not 
yet reached a point at which researchers are comfortable with the accuracy of 
the results (Maitra and Ramler 2010: 380; see also Solka 2008: 103). 

What this means in practice is that the scholarship on early Irish literature 
drawn upon above plays a role in considering accuracy of the results. Results 
which approximate what other scholars have determined using traditional 
methods are taken as being more credible than those which do not. However, 
this does not mean that a slavish attitude toward either the computational re- 
sults or the previous scholarship should be taken. If the results do not reflect 
what is expected, this could mean that continuing investigations are warranted 
to determine the exact reason for the differences and what these mean for the 
LL text clustering question. 


2.4 Normalisation and Principle Component Analysis 


Consider the matrix of LL: the frequency of each function word is often 0 and the 
number of possible function words is 1,125. In this context, each function word 
represents a dimension along which the vector resides. In other words, the vector 
space within which LL sits has 1,125 dimensions. When the k-medoids algorithm 
is applied to such a space, the number of dimensions contributes to the difficulty 
of finding optimal clusters. 

This argues for the use of orthographic normalisation to reduce the number 
of potential function words and hence dimensions with the intention of reduc- 
ing the size of the space and obtaining more optimal clusters. However, normal- 
isation is not a simple operation in the context of early Irish orthography. First 
removing initial mutations would reduce the number of dimensions but not by 
much. After this was done, a more difficult task would present itself: which one 
of the different orthographic variations would be appropriate? In addition, if 
one from a particular linguistic period was chosen, would this force other texts 
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which may have been written earlier or later to act like the normalised version? 
For the purposes of this study, this is a problem that is bearable. The complica- 
tions of normalisation are high and deserve their own discussion, which is out- 
side the scope of the current examination. 

In any case, a far more sophisticated and illuminating method than normal- 
isation is known as Principal Component Analysis (Abdi and Williams 2010) 
which attempts to capture as much of the variance as possible in as few dimen- 
sions as possible. Additionally, Principal Component Analysis allows visualisa- 
tion of k-medoids. 

As one can see from the graph below (Figure 1), a majority of texts in LL fall 
in a fairly small range. This means that the variation between the texts is fairly 
small or the texts in LL were either picked carefully to be very similar or they 
were edited in the process of copying to be more similar to each other. The far- 
thest outliers in Figure 1 are: on the x-axis, Dindgnai Temrach and, on the y-axis, 
Cóica epscop dodeochatar dochum Moedoc Ferna do Bretnaib Cille Muine and 
Drochcomaithech robaí i n-ocus dosom. The Principal Component Analysis ulti- 
mately shows that the texts have fairly low variance, which means that the use of 
function words tends to be uniform across the texts included in the analysis. One 
interpretation of this result is that the scribes could have chosen texts which used 
function words in a regular fashion. Another, more likely interpretation, and com- 
monly understood by modern scholarship, is that the scribes were involved in not 
only copying texts but also changing those texts and by those changes, used func- 
tion words in a consistent fashion. 


Figure 1: Principle Component Analysis. 
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3 Analysis 


As shown above in the discussion of Principal Component Analysis, the variation 
between the texts is constrained. Without more texts and texts which contain 
more variation in the use of function words, this makes any clustering solution 
rather weak. However, pressing on with the final calculations will demonstrate 
how to interpret and analyse the results of machine learning techniques on early 
Irish material. Therefore, three differing classes of further analysis will be pre- 
sented. The first uses known authors as the basis upon which to understand the 
clustering solutions provided. The second will investigate the scribal hands as 
the main basis to understand the clusters provided. The third will use genre iden- 
tification as the main method to understand the clustering solution provided by 
the k-medoids algorithm. 


3.1 Authors 


Once all the calculations from above are complete, the texts are ready for the 
final step. In this section, two k-medoid analyses will be presented. As stated 
above, estimating the number of clusters is still an area of active research. Thus, 
attempting to estimate the number of clusters is a subjective process. A method 
is to use a set of texts which have a known author, then attempt to fit that set to 
the output of k-medoids. For illustrating the analytic techniques involved in un- 
supervised machine learning analysis, two known authors who may assist in 
evaluating the accuracy of any particular cluster solution are used: Flann 
Mainistrech and Gilla Cóemáin (Smith 2007), who both wrote historical poems 
which are included in LL and are fairly well known. There are other known au- 
thors in LL. For instance, Fothad na Canone, Ailill Olomm, and Flan Fina but 
they will not be included in the analysis to keep the points being illustrated here 
clear. For instance, In the case of Flann Mainistrech, poems attributed to him in 
LL are listed below as presented by Pódór (1999): 
— Éstid a eolchu cen ón [Listen, scholars, without flaw]; as stated above in 2.1, 
this is not included. 
- Rig Temra dia tesbann tnü [The kings of Tara, without envy] 
- Inéol dib in senchus sen [Do you know the old tradition . . . ?] 
- Mide maigen Chlainne Cuinn [Mide, homestead of the descendants of Conn] 
(Smith 2001: 108-144) 
— Cia triallaid nech aisnis [Whoever attempts to tell the story] (Gwynn 1991 
[1903-1935], 4: 100-107) 
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- Cind cethri n-dini iar Frigrind [At the end of four generations after Frigriu] 
(MacNeil 1913: 48-54) 

- Ascnam ni seól sadail [Let us proceed - it is no easy undertaking] (MacNeill 
1913: 48-58) 

- Ani do ronsat do chalma [What Eogan’s race have done of valiant deeds] 
(MacNeill 1913: 59-70) 

- An gluind, a n-echta [Their deeds, their death-dealings] (MacNeill 1913: 
70-82) 

- Mugain ingen Chonchraid chain [Mugain, daughter of righteous Concrad] 

- Sil Aeda Slaine na sleg [The race of Aed Sláne of the spears] (MacNeill 1913: 
92-99) 


While not included in Pódór's list, the following is also added: 
- Rig Themra toebaige iar tain [The kings of many sided Tara, after that] 
- Adillu gairm n ilgrada [O lads of the names of great rank] 


The list of poems ascribed to Gilla Cóemáin are listed below (Smith 2007: 25-7): 
- Hériu ard inish na rríg [Lofty Ireland, island of the kings] (Smith 2007: 
104-169) 
- At-tá sund forba fessa [Herein is the apex of knowledge] (Smith 2007: 
170-187) 
- Annálad anall uile [All the annal-writing heretofore] (Smith 2007: 188-211) 


With the above in mind, it is time to consider the k in k-medoids. The variable k 
represents the number of clusters to which a solution is found by the algorithm.’ 
The clustering solution places all texts in LL into the number of clusters as signified 
by k. A cluster could mean an author so that k could equal the number of authors 
of LL. The possibility that clusters do not represent authors is explored below. As 
previously mentioned, there is no reliable way yet to estimate the number of clus- 
ters in a clustering solution. This leaves k as arbitrary, although not wholly so. In 
particular, the method chosen for this chapter is that k is adjusted iteratively until 
the texts begin to coalesce into clusters which look like the above list of texts. This 
began to happen when k — 15. Using a technique called a Silhouette, which is de- 
scribed below in section 3.3, the choice is further refined until it was decided to use 
k=27, which gave the best clustering solution and thus 27 authors of LL As 


2 The full clustering solutions in CSV format for each value of k used in this paper are avail- 
able at: https://github.com/cyocum/bol project/tree/master/cIustering solutions 
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mentioned above previously, known authors can constrain k to a number in which 
their texts begin to cluster together. As explained above, to test this hypothesis, we 
use known authors with known lists of texts in LL. First we will examine Flann 
Mainistrech and where the texts which are attributed to him fall within this list. 
Table 1 is subset of the clustering solution for all the texts of LL for which 
Flann Mainistrech is the attributed author. The Cluster ID is the number of the 
cluster which is assigned by the algorithm. All texts assigned the same Cluster 
ID are within the same cluster. As the reader will notice, not all of Flann’s texts 
are clustered together and a few are spread out among different clusters. 
Moreover, one will also notice that all the poems in cluster 10 are of the same 
genre, in particular, they are all historical poems which have many personal 
names in them, which reduces the overall frequency of function words. The rea- 
son for this may be that the style and meter are tightly constrained and thus, 
the function word use is similar across all the poems. This might mean that 
there are fewer function words and that the same ones are used frequently. 
However, authorship may still be preserved. These results may just be a conse- 
quence of the choice of known authors, who are both noted for writing mostly 
historical poems. A fuller analysis of this phenomenon will be discussed below. 


Table 1: Flann Mainistrech where k = 27. 


Cluster ID Title Volume Scribe Scribe 
(Schliiter) (Duncan) 

None Estid a eolchu cen 6n 1 None None 

10 Rig Temra dia tesbann tna 3 U U 

10 Mide magen clainne Cuind 4 U U 

10 Cind cethri ndini iar Frigrind 4 U U 

10 Sil Aeda Slane na sleg 4 U U 

10 Ani doronsat do chalmu clanna 4 U U 
Eogain 

10 Ascnam ni seol sadal U U 

3 Angluind a n-echta a n-orgni batar 4 U U 
infhir 

3 Inn eól düib in senchas sen 3 U U 

6 Mugain ingen Chonchraid chain 3 U U 


6 A Gillu gairm n ilgrada 1T T2 
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Moving to Gilla Cóemáin, the situation is even more pronounced: 


Table 2: Gilla Cóemáin where k = 27. 


Cluster ID Title Volume Scribe (Schlüter) Scribe (Duncan) 
10 Hériu ard inis na rríg 3 A A 
11 Attá sund forba fessa 3 U U 
17 Annalad anall uile 3 U U 


None of Gilla Cóemáin's historical works falls into the same clusters. This surpris- 
ing result shows that our assumptions regarding ascription of the texts to Gilla 
Cóemáin may be incorrect in this instance. On the one hand, that Hériu ard inis na 
rríg clustered with historical texts by Flann Mainistrech does in cluster 10 may 
mean that there is enough information to cluster it with other historical poems. On 
the other hand, that Analad anall uile clusters with other historical poems in clus- 
ter 17 marginally supports an argument that the cluster algorithm is identifying 
genre rather than authorship. As for Attá sund forba fessa in cluster 11, this poem 
is in the same cluster as another didactic poem, Sluindfet duib dagaisti in dana 
(Thurneysen 1912: 73-77) and a poem about a quarrel between an old woman and 
a retainer of the king of Leinster, A bairgen ataí i ngábud (Ua Nualláin 1904). 
However, this does not negate the interpretation of clustering solutions as author- 
ship. Ultimately, it means that Gilla Cóemáin may not have written these texts and 
these texts are being placed with their anonymous authors who may have also 
written historical poems. In addition, the limitations of the present methodology 
may be interfering with the placements of the texts. 


3.2 k-20 


The only change between this clustering solution and the previous one is that k was 
set to 20 rather than 27.^ All other parameters were kept the same for the sake of 
consistency and comparison. As would be expected, the change in the clustering so- 
lution is small. Most texts stay in the same clusters even if the cluster numbers have 
shifted. However, some texts like Cia triallaid nech aisnis, which was in the 
same cluster as those identified with Flann Mainistrech in the case of k - 27 
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where k = 20 (see Table 3 below), have moved. This means that the text was 
probably near the boundary between two clusters, and thus two authors, and 
was assigned to the correct author by k-medoids but when the number of clus- 
ters changed, the text was assigned to a new cluster based on the new bound- 
aries calculated. This situation is why a quality check as described below 
using Silhouettes and previous scholarship on early Irish literature are con- 
sulted to detect these situations and determine the best clustering solution. 


Table 3: Flann Mainistrech where k = 20. 


ClusterID Title Volume Scribe (Schlüter) ^ Scribe (Duncan) 
None Éstid a eolchu cen ón 1 None None 
2 Aní doronsat do chalmu 4 U U 


clanna Eogain 


9 Mugain ingen Chonchraid 3 U U 
Chain 

11 Cia triallaid nech aisnis 4 U U 

14 A Gillu gairm n ilgrada 1 T T2 

19 Mide magen clainne Cuind 4 U U 

19 Cind cethri ndíni iar Frigrind 4 U U 

19 5/ Aeda Sláne na sleg 4 U U 

19 Ascnam ní seól sadail 4 U U 

19 Angluind a n-echta a n-orgni 4 U U 
batar infhir 

19 Inn eól dúib in senchas sen 3 U U 

19 Ríg Themra dia tesband tnú 3 U 

19 Rig Themra toebaige iar tain 3 U U 


For Gilla Cóemáin, the clustering solution where k = 20 (see Table 4 below) is 
much the same. There is no overlap between his known historical poems and 
the cluster solution. 

While the focus has been on Flann Mainistrech and Gilla Cóemáin, there 
are other indications that the clustering solution is registering authorship 
rather than style. If the analysis is done with k=5, one will notice that Fothad 
na Canone, Ailill Olomm, and Flan Fina’s texts will cluster together. Thus, it 
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Table 4: Gilla Cóemáin where k = 20. 


Cluster ID Title Volume Scribe (Schliiter) Scribe (Duncan) 
4 Annalad anall uile 3 U U 
15 Atta sund forba fessa 3 U U 
19 Hériu ard inis na rríg 3 A A 


seems given the amount of clustering in both Flann and other cases that this 
method gives clues as to the underlying authorship of various texts in LL. 


3.3 Silhouettes 


While the above may, at first glance, look like an open and shut case for the 
case in which the clustering and the scholarship coincide, there is more to the 
story than Flann and Gilla. Silhouettes are used to validate and interpret clus- 
tering solutions (Rousseeuw 1987). Silhouettes are a measure of how well each 
text resides in its cluster and thus the quality of the clustering solution. 

The Silhouette histogram (Figures 2 and 3) is a graph which has an x-axis 
that is bounded between -1.0 and 1.0. The closer to 1.0 a text falls, the further 
away it is from its cluster. The closer to O a text falls, the closer the text is on the 
boundary between clusters. The closer to -1 a text falls, the closer the text is to 
the wrong cluster. The y-axis counts how many texts have the same value on the 
x-axis. Ideally, there should be more bars on the positive side of 0.0 on the x- 
axis, meaning the text is in the correct cluster, for a good clustering solution. 

As one can see in both the case where k- 20 (Figure 2) and where k- 27 
(Figure 3), a large number of texts fall on the border between two clusters, 
around zero on the x-axis on the graph. This means that they fall on the edges of 
clusters and many others fall in the negative area on the x-axis. In turn, this 
means that they are probably in the wrong cluster. This is true for most clustering 
solutions used in the analysis. Increasing the number of clusters (the size of k) 
should provide a solution for this but, as stated above, texts which are known to 
be composed by different authors start to be placed into the same clusters so 
given: the constraints of known authorship, the use of function words rather 
than POS tagging, since k - 27 has numerically more texts on the positive side of 
0.0 on the x-axis than k = 20, k - 27 is the best clustering solution presently avail- 
able. In fact, none of the clustering solutions are entirely satisfactory. Once an 
accurate POS tagger for Old and Middle Irish is created, the results from applying 
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k-medoids on the vectors created from the POS tagger can then be investigated 
and checked against the current analysis to see how well this method worked 
and how well, overall, these statistical methods can work with material like the 
Old and Middle Irish corpus. 
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Figure 2: Silhouette where k = 20. 
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Figure 3: Silhouette where k = 27. 
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3.4 Scribes 


As is well-known, scribes in medieval Ireland were not above making changes 
to their source material. This included updating the language in various ways: 
modernising the spelling, moving sentences, and, most importantly for this 
analysis, changing function words in various texts (Boyle and Hayden 2014: 
xxxvii-xlvi). Thus, while an argument can be made that k-medoids analysis can 
identify authors, there is an equally strong chance that the texts cluster because 
of this kind of scribal “editorial” activity regardless of the ascribed authorship. 

Elizabeth Duncan has identified and catalogued the various folios to the 
nine scribes of LL (Duncan 2012). She does not, however, give a table of folios 
to texts. Duncan’s tables can be supplemented by the appendix supplied by 
Schlüter (2010: 226-243), which does give the scribal hand to text. Correlating 
the two tables gives a good indication of what each scribe wrote. In the case of 
overlapping scribal activity, the text is awarded to the scribe who contributed 
the most number of folios to a particular text. This is, of course, arbitrary and 
open to criticism but it suits the purposes of this analysis. 

Using k-medoids analysis on the scribes of LL is much simpler than attempt- 
ing the analysis for authors because much more is known about the hands in- 
volved in the manuscript and the fact that scribes sometimes claim, by, for 
instance adding a note above the beginning of a text, that a text was written by 
an incorrect author is not as much of a problem. The physical activity of the 
scribes means that there is less uncertainty surrounding which scribe wrote what 
section of text than attempting a more speculative analysis of authorship. 

The results for the clustering solution where k= 9 (see Appendix) show that 
the scribes’ works admix freely and do not necessary cluster together as one 
would expect if scribes were strongly represented in the texts. This indicates 
that the scribes were not strongly influencing the texts themselves. 


4 Genre 


Defining different genres in early Irish literature is a problem which has exer- 
cised both the early Irish themselves and modern scholars. The problem is one of 
principles of categorisation. While modern scholars tend to organise the tales 
into a series of “cycles”, the medieval Irish organised the tales by the main action 
of the story (for discussion on medieval Irish literary theory, see Coileain 1974; 
Backhaus 1990; Poppe 1999, 2008; Stam 2010: 66-68). Mac Cana (1980: 41-73) 
compiled the canonical list of medieval Irish genres in The Learned Tales of 
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Medieval Ireland from manuscript source, a summary of which is given below. 
There are twenty genres according to Mac Cana’s list. In other words, does k = 20 
cause the texts to fall into the same order of genres as proposed by Mac Cana 
(1980: 73-81)? If this is the case, then k-medoids could be clustering by genre as 
defined in the tale lists rather than by author (Juola and Baayen 2005). 

— Aided ‘death-tale, violent death’ 

- Aithed ‘elopement’ 

- Baile/buile ‘vision, frenzy’ 

- Cath ‘battle’ 

— Compert ‘conception, begetting, procreation’ 

— Echtra(e) ‘expedition, journey (to the otherworld), adventure’ 

— Fess/feis ‘feast’ 

— Fis ‘vision’ 

— Forfess/forbais ‘beleaguering, seige, night-watch’ 

- Im(m)ram 'sea-voyage' 

- Orcrain/orcun ‘murdering, ravaging’ 

- Serc ‘love 

- Slúagad/slógad ‘a hosting, a military expedition’ 

— Táin ‘driving off, cattle raid’ 

- Tochmarc ‘wooing, courting’ 

— Togail ‘attack, destruction; attacking destroying’ 

— Tomaidm ‘bursting forth of lake or river’ 

- Uath ‘terror, horror’ (although it seems to be a late genre) (Mac Cana 1980: 81) 


A simple example will suffice for an answer. In the clustering solution for k = 20,” 
four Aided tales (Aided Cheltchair meic Uthechair, Aided Cuanach meic Ailchini, 
Aided Derb Forgaill, Aided Meidbe) appear in cluster 11; however, this is mislead- 
ing as the cluster is the largest with forty entries, which will not be shown here as 
it would be impractical (see rather the data referenced in footnote 5), and contains 
many other kinds of tales which are obviously not related. Two other Aided texts 
appear in cluster 2 but cluster 2 contains twenty-two other texts. While this is one 
example genre, it holds true for the other tale types. Especially since some of them 
cluster together with each other rather than in their own clusters. For instance, 
cluster 11 contains two fáin texts (Táin Bó Flidais and Táin Bó Fraích) and two 
cath texts (Cath Carn Chonaill and Cath Maige Mucrima), which are among many 
texts that are decidedly not aided texts. This argues strongly against the cluster- 
ing solution reflecting the early Irish tale lists. 


4 Text clustering and methods in the Book of Leinster —— 103 


5 Conclusion 


The fundamental question asked at the beginning of this chapter was: how can 
unsupervised machine learning and statistical techniques be used in assessing au- 
thorship attribution in early Irish texts? An attempt was made to answer this ques- 
tion from a machine learning perspective using tf*idf and spherical k-medoids 
analysis to create a methodology from which a clustering solution was created 
using the Julia programming language. This methodology was then applied to 
LL with a number of texts removed for various reasons. Then a number of ways 
of understanding the clustering solutions was attempted: author, genre, and 
scribe. For authorship, the attributed texts in LL for Flann Mainistrech and 
Gilla Cóemáin were used to understand the clustering solutions k=20 and 
k=27. In this particular case, it seems that, while Flann Mainistrech’s texts 
tend to cluster together, Gilla Cóemáin's do not, which would suggest that the 
texts ascribed to Gilla Cóemáin could possibly not been written by him but 
were written by others then subsequently ascribed to him. From the foregoing 
analysis, it would seem that there are 27 authors in LL. These clustering solutions 
were quality-assessed using Silhouettes, which showed some difficulties with the 
clustering solution. Two further clustering solutions were created: one that at- 
tempted to match known early Irish genres to clustered texts, and one that at- 
tempted to match clusters of texts to known scribal hands. In both cases, no 
correlation was found. This result suggests that scribal hands and genre are not 
useful when attempting to attribute authorship and should be avoided. 

As identified in this paper, the methodology is strict; however, there is 
ample room for further improvement and research. For instance, the use of POS 
tagging rather than function words and the reduction of the dimensionality of 
the resulting matrix by using orthographic or other normalisation techniques 
should be investigated. This research may increase the accuracy and quality of 
the clustering solutions as identified using Silhouettes and bring more schol- 
arly interest to this style of analysis. The method is not necessary conclusive, 
but rather suggestive and can help guide future research into the issue. In a 
broader sense, once the methodological difficulties are overcome, this method- 
ology is possibly applicable to all of early Irish literature for which we have 
electronic versions. 

Additionally, there is also room for more texts. For instance, Lebor Gabála 
was left out because of its complex textual and scholarly history. Moreover, the 
way in which the texts were separated in the electronic versions caused some 
known texts to be excluded. These exclusions directly influence the accuracy 
and reliability of the clustering solution. Further research into creating methods 
for extracting these texts in a coherent way, given that they are intimately 
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bound into the textual history of their enclosing text, and presenting them for 
analysis is also necessary. 

Using unsupervised machine learning techniques and methods as pre- 
sented here to answer questions of authorship in early Irish opens up new ave- 
nues of research and discovery, not just for LL, but for the whole of early Irish 
literature. 


Appendix 


The clustering solution where k = 9 is presented below in Table 5. 


Table 5: k = 9. 

Cluster Title Volume Scribe Scribe 

ID (Schlüter) (Duncan) 

1 A Gillu gairm n ilgrada 1T T1 

1 Cinti crábuid gnathaigthe scoile Sinchil 6 A A 

1 Cormac mac Culennain larfaiged nech acaib 1T T4 
dam 

1 Diarmait mac Cerbaill mairg thocheas rí clerchib 3 U U 
ceil 

1 Dublitir hua Uathgaile rédig dam a Dé do nim 3 U U 

1 Dubthach hua Lugair Andsu immarbáig ri Lagnib 1A A 

1 Dubthach hua Lugair Crimthan clothri 66/66 1A A 
Hérend 

1 Feidlimid athair Echach 1 AG) A 

1 Fland Fina in rigan ecanaid óg fíal 3 U A+U 

1 Fland Fina Ro ddet a hlnis find Fail 1A A 

1 Fland Mugain ingen Chonchraid chain 3 U U 

1 Fothad na Canone Cert cech rig co rréil 3 U U 

1 Fothad na Canone Eclais Dé bi 3 U U 

1 Gilla Cóemáin Annalad anall uile 3 U U 


1 Gilla Mo Dutu Adam óenathair na ndórene 3 U U 
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Table 5 (continued) 


Cluster Title Volume Scribe Scribe 
ID (Schlüter) (Duncan) 
1 Gilla in Chomded úa Cormaic a rig ríchid reidig 3 U U 
dam 
1 Gilla na Naem Hua Duind Cuiced Lagen na lecht 1A A 
rig 
1 Mac Cosse of Ros Ailither Rofessa i curp domuin 3: U U 
düir 
1 Teist Chathail meic Finguine 3 U U 
1 Tri Fothaid Elgga cen chron 3 U U 
2 Angluind a n-echta a n-orgni batar infhir 4 U U 
2 Ascnam ni seol sadal 4 U U 
2 Brislech Mór Maige Muirthemni 2 U U 
2 Cind cethri ndíni iar Frigrind 4 U U 
2 Cináed haa Artacáin Fianna Bátar i nEmain 1A A 
2 Echta Lagen for Leth Cuind 1A A 
2 Fland Mainistrech Rig Themra dia tesband tnü 3 U U 
2 Fland Manistrech Inn eól düib in senchas sen 3 U U 
2 Gilla Cóemáin Hériu ard inis na rríg 3A A 
2 Guidim Comdid cumachtach 1A A 
2 Inis Dornglais ro gab Crimthann 4 A A 
2 Mael Muru Othna Can a mbunadas na nGaedel 3 U U 
2 Marb Cairpre Masc co n-áne LI T2 
2 Mide magen clainne Cuind 4 U U 
2 Mugdorn ingen Moga Duib Conan gilla Find 4 A A 
2 Ossin Ogum i llia lia úas lecht 3 U U 
2 Síl Aeda Slane na sleg 4 U U 
2 Turim Tigi Temrach TT T2 


3 A Maccáin ná cí 3 U U 
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Table 5 (continued) 


Cluster Title Volume Scribe Scribe 
ID (Schliiter) (Duncan) 
3 Ailill Ólomm beir mo scíath fri Gath 3 U U 
3 Ani doronsat do chalmu clanna Eogain 4 U U 
3 Cinaed úa hArtacain a cholch thall for elaid úair 3 U U 
3 Dallán mac More Cerball Currig cáemLife 1A A 
3 Dian airing 1T T1 
3 Orthanach húa Cáellama A Chóicid chóem 1A A 
Chairpri chrüaid 
4 A ben bennacht fort na raid 5 F F 
4 Aided Derb Forgaill 2 U U 
4 Aided Meidbe 2 U U 
4 Audacht Morainn 5 S 5 
4 Clanna Ailella Uluim uill 3 U U 
4 Cathcharpat serda 4 A A 
4 De Gabail intSida 5 F F 
4 De dülib feda na fored 1T T1 
4 Dá brón flatha nime 5 F F 
4 Días macclerech 5 F F 
4 Echtra Laegaire meic Crimthainn 5: F F 
4 Fechtas aile do MLing is Toídin 5 (F?) F 
4 Mo Lling Luachra dalta do Maehóc Ferna 5 (F?) F 
4 Mo Lling Rochüala la nech légas libru 3 U U 
4 Nuallguba Emire 2U U 
4 Senchán Torpéist Apair ri sil nEogain Móir 3 U U 
4 Slan seiss a Brigit co mbuaid 1A A 
4 Tüathal Techtmar ba ri Temrach 1 AQ) A 
4 Esnada Tige Buchet 5 F F 
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Table 5 (continued) 


Cluster Title Volume Scribe Scribe 

ID (Schlüter) (Duncan) 

5 Cu Chulaind atbert. De aduentu Christi 2U U 

5 Maiccni Echach ard a ngle 3 U U 

6 A Chormaic coisc do maicni 3: U U 

6 Audacht Moraind 6 A A 

6 Augaine Már mac ríg Hérend 1 AQ) A 

6 Ciaran cecinit 6 A A 

6 Dialogue between Brendan and Moínenn 6 A A 

6 Fiacail Patric 6 A A 

6 Fochond Loingse Fergusa meic Roig 5. F 

6 Macclerech do muntir Ferna móire 5 (F?) F 

6 Messe bad rí réil 3 U U 

6 Senbriathra Fithail 6A A 

6 Tecosca Cormaic 6A A 

6 Tain Bó Flidais 5 F F 

7 Brandub mac Echach 1A A 

7 Cethri srotha déc éicsi 1T T1 

7 Fithal 7 Cormac Niba mé linfes do neoch dara 3 U U 
thráth 

7 Orthanach hia Cáelláma Masu de chlaind 1A T1 (over A) 
Echdach aird 

7 Secht mbémmend Brandub for Brega 1A A 

8 Birth of Brendan 6A A 

8 Broccán Craibdech Lecht Cormaic meic Culennáin 1A A 

8 Borama 5 S S 

8 Cellach Húa Ráanada sluindfet dúib dagaisti in 1T T1 


dana 
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Table 5 (continued) 


Cluster Title Volume Scribe Scribe 

ID (Schliiter) (Duncan) 

8 Cethrur ar fichet nosfail 6A A 

8 Clanna Falge Ruis in rig 1A A 

8 Colum Cille cecinit 6A A 

8 Connachta cid dia tá in t-ainm 1T T1 

8 Cuan Hua Lothchain Temair breg bale na fian 1A A 

8 Epscop Ibar 6A A 

8 Gilla Cóemáin atta sund forba fessa 3 U U 

8 Rig Themra toebaige iar tain 3 U U 

8 Sarbili anim Mo Ninni 6A A 

8 Scrín Adomnáin 6 A A 

8 Trea ropo maith in ben 6 A A 

8 Táin Bó Cüalnge 2 T(*P T1+F 

8 Etsecht Luin Garad 6A A 

8 Uar in lathe do Lum Luine 3 U U 

9 A bairgen ataí i ngábud 1A A 

9 Aided Cheltchair meic Uthechair 2U U 

9 Aided Choncobuir 2U U 

9 Aided Cuanach meic Ailchini 5 F«A F 

9 Aided Guill meic Carbada 7 Aided Gairb Glinne 2U U 
Rigi 

9 Aigidecht Aithirne 2U U 

9 Baí rí amra de Grécaib Salemón a ainm DF F 

9 Beochobra Con Culaind isind ló fúair bás 2 F U 

9 Buí siur Mo Lassi Lethglinni oc légund i fail Mo క్స్‌ F F 
Lasse 

9 Caillech dorat a mac désum do Mling 5 (F?) F 

9 Cath Carn Chonaill 5 -F F 
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Table 5 (continued) 


Cluster Title Volume Scribe Scribe 

ID (Schliiter) (Duncan) 

9 Cath Maige Mucrima 5 F(+S) F+S 

9 Cethrur macclerech 5b F 

9 Cogad Gaedel re Gallaib 5 T T2 

9 Cormac mac Cuilennain cecinit 6 T T1 

9 Cummíne Fota mac Fiachnai di Eoganacht Chassil 5 A F 

9 Cóica epscop dodeochatar dochum Moedoc Ferna 5-dE E 
do Bretnaib Cille Muine 

9 De Chophur in da Muccida 5 F F 

9 Dindgnai Temrach 1T T1 

9 Do fallsigud Tána Bó Cualnge 5 F F 

9 Drochcomaithech ro baí i n-ocus dosom 5 (F?) F 

9 Epscop do Gaedelaib dochoid do Róim 5 F F 

9 Fechtas do Mling is Tóidin co n-acca Mael 5 (F?) F 
Doborchon 

9 Fechtas dósom oc ernaigthi ina eclais 5 (F?) F 

9 Fland Manistrech Cia triallaid nech aisnis 4 U U 

9 Fothart for trebaib Con Corbb 1 AQ) A 

9 Gormlaith ingen Flain cia dír do chlérchib na cell 1A A 

9 Gormlaith ingen Flain tanic ar debaid ó Cherball 1 A+T+2 T2 
mac Murician 

9 lartaige na hingine colaige do Grécaib ల్స్‌. F 

9 Immacallam in dá Thüarad 4 U U 

9 Incipit Cath Ruis na Rig 4 A A 

9 Incipit de maccaib Conaire 5:5 5 

9 Longes Chonaill Chuirc 5 F F 

9 Longes mac nUsnig 5 U(+M) U (+ M) 

9 Luid Feidilmid Rechtaid 6 Themair do sáerchuaird 1 A+T+2 T1 (over A) 


for Laigniu 
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Table 5 (continued) 


Cluster Title Volume Scribe Scribe 
ID (Schliiter) (Duncan) 
9 Luid Mael Ruain Tamlachta fechtas dia airge 5. FE F 
9 Medb Lethderg Macc Moga Corbb celas clü 1A A 
9 Mesca Ulad 5 M M 
9 Na Tri Fothaid 4A A 
9 Noenden Ulad 7 Emuin Macha 2 U U 
9 Orgain Dind Rig 5 F F 
9 Ri irissech ro boi do Grecaib 5 F F 
9 Scél Niall Frossach 5 F+A A 
9 Scél mucci Meic Da Tho 2 U U 
9 Scéla Chonchobuir 2 F F 
9 Senchas Ailidin Chobthaig 5 F F 
9 Sloiged mar rucsat Gréic co Hebrib fechtas n-aile 5 F F 
9 Story of Athirne Ailgessach and Midir of Bri Leith 2 U U 
9 Story of Athirne and Amairgen son of Ecet Salach 2 U U 
and Aigidecht Aithirne 
9 Talland Etair 2 U U 
9 Tech Midchüarda 1T T1 
9 Temaile fáid Miled Espáin 4A A 
9 Tochmarc Ferbae 5 U U 
9 Trefocul 1T T2 
9 Triar macclerech 5b F 
9 Tréide Cétna Labratar larna Genemain 2U U 
9 Trí Dé Donand 1.T T1 
9 Táin Bó Fraích 5 F F 
9 Túarastla Rosa Failgi 1 A+T+2 A 
9 Fingal Ronain 5 F(+A) F 
9 Óenach Talten 5 F+A A 
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Part 2: Morphosyntactic variation and change 
in medieval Celtic languages 


Liam Breatnach 
5 The demonstrative pronouns 
in Old and Middle Irish 


1 Introduction 


The distinction between stressed and enclitic demonstratives is fundamental. In 
modern editions enclitic forms are usually printed with a preceding hyphen, 
this convention being most frequently observed in the case of the notae au- 
gentes, which are always enclitic, e.g. baitsim-se ‘I baptise’, ad-cobra-som ‘he 
desires’. Unfortunately, however, the distinction is hardly ever observed in the 
case of the demonstratives, which, unlike the notae augentes, have both stressed 
and enclitic forms, which are not consistently differentiated in writing. As Old 
and Middle orthography does not, for example, regularly mark the long vowels 
in the stressed forms 56 and sé with a length-mark, or separate stressed sin from 
the preceding word by a space, as opposed to writing enclitic -sin as part of 
the preceding word, the most reliable criterion left to us is metrical evidence. 
Accordingly, most of the examples which follow are taken from Old and Middle 
Irish verse? Much of what is established here regarding the earlier language ap- 
plies also to Classical Modern Irish, the rules for which are set out by McManus 
in McManus (1994: 431-432, 88 9.4—9.5), although the system there is further 
complicated by the approximation in form of the third singular masculine and 
third plural nota augens, -s(e)an, with the enclitic demonstrative, -s(a)in.* 


1 The examples are taken from GOI § 403. See also Griffith (2008). 

2 For word-division see GOI 834. In Breatnach (2003) I showed that the stressed forms of the 
demonstrative meaning ‘this’, previously believed to be so, se, with short vowels, are in fact só 
and sé, with long vowels, and thus more differentiated from the enclitic forms than had been 
thought. 

3 Where necessary, I silently introduce hyphens before enclitic forms, and separate stressed 
forms from what precedes. 

4 They differ of course in the quality of the final -n; nevertheless the superficial resemblance 
of these two forms in Classical Modern Irish may have contributed to uncertainty as to whether 
a particular case of sin in an Old or Middle Irish text was enclitic or stressed. The replacement 
of the -m in -som by -n had begun in the late Middle Irish period; a few examples from the 
Book of Leinster are given in Breatnach (1994a: 264 810.2), where dóib-sin (LL line 8367), is a 
misinterpretation of dóibsin of the diplomatic edition; this should be read as dóib sin, with the 
stressed demonstrative. 


3 Open Access. © 2020 Liam Breatnach, published by De Gruyter. This work is licensed 
under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. 
D d o/10 9783110680 00g 
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Taking the enclitic demonstratives first, a few metrical examples will suffice 
to establish their prosodic status. They are all taken from the Félire Óengusso, a 
text which can be closely dated to c.800 AD? The metre of this substantial text of 
591 quatrains is Rinnard, with obligatory rhyme between the finals of the second 
and fourth lines, six syllables in every line, and each line ending in a disyllable. 
This last requirement guarantees the enclitic status of -sin in the example cited 
(rhyming parts are bolded): 


(1)  Paiss Eutaicc la Fintan 
Maeldub, mór a ngáir-sin, 
caingrian ocont sléib-sin, 
dend Eoganacht áin-sin. 
*The passion of Eutychius, with Fintan Maeldub— great is that shout!—the 
fair sun at that mountain, of those splendid Children of Eogan.' (Stokes 
1905: verse for 20 Oct.) 


The enclitic status of -sa ‘this’ is confirmed not only by the disyllabic ending in 
the third and fourth lines, but also by the rhyme of slóg-sa, with the demonstra- 
tive, and tróg-sa, with the nota augens in:’ 


(2  Dom-rorbae domm théti, 
ol am triamain tróg-sa, 
iar timnaib ind ríg-sa 
rith ro ráith in slóg-sa. 
*May it profit me for my comfort, for I am a wretched weary one, the 
course which this host has run according to the commandments of this 
King!’ (Stokes 1905: Prologue 25) 


Similarly in the case of -se, the variant after a palatal consonant, we have the 
demonstrative in the third and fourth lines, and the nota augens in the second 
line in: 


5 Cf. Breatnach (1996: 74-75). 

6 As well as further instances in the verses for 20 Jun., 2 Aug., 12 Oct., 16 Oct. and at Epilogue 
29. 

7 Note also the aicill rhyme between in tráth-sa ‘at this time’, with demonstrative, and ro gád- 
sa ‘I have prayed’, with nota augens (Stokes 1905: Epilogue 411-412). 
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(3) A Dé móir not guidiu, 
cluinte mo chneit tráaig-se, 
ro beó iarsin báig-se 
i mbithgnáis int Slàaig-se! 
*O great God, I entreat Thee, hear my wretched sigh! May I be after this 
battle in the everlasting company of this host.' (Stokes 1905: Epilogue 313) 


The only variation is phonological, either contextual, viz. assimilation of the s- to 
the quality of the preceding consonant, or historical, viz. -so » -sa. Otherwise, the 
same form can be attached to a noun in any case or number preceded by the 
article. 


2 Stressed demonstratives and their flexion 
in Old and Middle Irish 


In these, there is a degree of variation in the forms. I will take the forms and 
range of use of sin, and of só/sé, where the referent is inanimate, and then in- 
stances of both of these with animate referents. 

As for the demonstrative pronoun for 'that', apart from whether or not in 
precedes it, and the rare variant sen, there is no variation in its form for case 
inflexion, that is, it is always spelled (in) sin, and the final -n is palatalised, as 
shown by the rhyming examples below. All the Old Irish examples of the rare 
variant sen cited in DIL (S 231.8) are singular, and are from the tract on the 
Mass in the Stowe Missal, viz. one instance as the subject of the copula, in sen 
‘that’ (Thes. 2: 253.16) and two instances after prepositions, for sen ‘thereafter’ 
(Thes. 2: 252.14) and ho sen suas ‘from that upwards’ (Thes. 2: 255.7). In Middle 
Irish, on the other hand, the form varies between sin, sein, and sain, with some 
rare instances of sen; see Breatnach (1994a: 275 8 10.24).? 


8 A reader adds an example from the Southampton Psalter: linn in sen oc Hiurusalem ‘that [is] 
a pool at Jerusalem’ (Ó Néill 2012: LXIII no. 12). The same gloss also has enclitic -sen in esin 
lind-sen ‘in that pool’. 
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2.1 Nominative 


While verbal endings and copula forms allow for distinction between nomina- 
tive singular and plural to be expressed, nevertheless, examples of the plural 
are very rare in Old Irish, only becoming well attested in Middle Irish. Most of 
the examples I have of the nominative plural have animate referents, for which 
see further below in section 4.? 

An example as the singular subject of a (passive) verb is (here and below, 
all relevant demonstratives are glossed in bold; if only part of the example is 
glossed, the translation of the glossed part is underlined):!? 


(4) Gabthae tí chorcrae imund rig 
lasa senad co ndimbríg 
ba do genuch fo-cres sin, 
be3s¢.prer for mockeryp,; PV-putsscprer.pass thatyom 
nibu duthracht a chumtaig. 
‘A purple cloak was put about the King by the ignoble assembly; in mock- 
ery that was put about him, not from a desire to cover him.’ (Blathm. 
verse 52) 


A Late Old Irish/Early Middle Irish instance with an intransitive verb is: 


(5)  Lethbairgen 7 ordu éisc 7 lind in topair do-rat Dia dam. 
dom-fic sin cach dia 
PV-1SG-COM6ssc.prrs thatyom €VeTYcen.sc.masc daYcen 
ol sé tria thimthirecht aingel 
“Half a loaf and a morsel of fish and the liquor of the well, God has given 
me. That comes to me every day”, said he, “by the service of angels.”’ (LU 
line 1846 [hand H]; author's trans.) 


9 A reader notes two examples with an inanimate referent, viz. it he riaglóri in sin adchomla- 
tar fri epacta ‘those are the regulars which are added to epacts' (Thes. 2: 17 [Carlsruhe Bede 
32?8]), and, in Scél Mongáin: Batar hé sin a imthechta ‘These were his adventures’ (White 
2006: 76, 82). 

10 Further examples with a passive verb are in Blathm. (verse 245), and, in prose, Binchy 
(1962: 60 8 12), and Gwynn (1914: 166.13). 

11 From Immram Curaig Maíle Düin; the translation deviates slightly from that in Oskamp 
(1970: 139). 
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A Middle Irish instance with a transitive verb is: 


(6) ar nis fil do plaig nó dunibad for bith 
nachus bera sin for culu. 
NEG-3PL-bring3s¢. pres.susy thatyg, upon bacKacc.p, 
‘For there is no plague or mortality on earth which that would not repel.’ 
(Stokes 1891: 430—431 § 21) 


An Old Irish example with the copula is: 


(7) Is ed trà in sin amnin 
COP3sc.pres it then రయం thatyoy indeed 
ni méte ni thormassid 
ecosc n-aimin airm hi ta 
tegdassa ad-chondarc-sa 
‘That then is indeed—no doubt you can solve it (viz. the riddle)—the 
lovely form, where it is, of the house which I have seen.’ (Thes. 2: 292 
verse 8; author's trans.) 


2.2 Accusative 


When the demonstrative (whether sin or sé) is the object of a verb, the verb may 
be accompanied by an infixed pronoun; see GOI (8 478) for examples. This is 
only attested in the singular, and with a neuter pronoun. While masculine or 
feminine singular, as well as plural infixed pronouns with só and sin might the- 
oretically be conceivable, none are attested. The instances with a neuter pro- 
noun may, then, be a special case. 

Examples of the accusative are: 


(8) Cenél  do-rigni in sin 
YaCenom PV-dOgue.ssc.rrer theacc.sg thatacc 


12 Cf. also the example in verse in Thes. 2: 290.14. 

13 For the second line see Breatnach (1983). The poem from which the example is taken has 
been re-edited with translation and notes by Ahlqvist (2018). 

14 A further metrical example is in Thes. 2: 294.13. A reader notes also examples of the asg. of 
sin as the object of comparison after equatives in the Old Irish Glosses, viz. sic bith suthainidir 
sin ainm Solmon, ‘even so lasting will be the name of Solomon’ (Ml. 90°10), and the instances 
in Ml. 36°21, 57°12, 75°7 and 131412 (all with sin). 
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ata foraib orbbadail; 

is ainces ngalair cen traig 

a mbith cen flaith fo bithphláig. 

‘The race who did that suffer dispersal of heritage; their being without a 
kingdom under eternal plague is a sickly undiminishing misery.’ (Blathm. 
verse 117) 


(9)  Cenid relcset Iudei sin, 
though-NEG-3SGy;yrsufferauc.sp.prer JeWromrr thatacc 
coiniud Crist dia sainmuintir, 
nem cona airbrib—trén dü!— 
ro-coínset uili Ísu. 
‘Though the Jews did not suffer that Christ should be mourned by his own 
people, Heaven (strong place!) and its hosts, all mourned Jesus.' (Blathm. 
verse 128) 


Examples from Middle Irish texts are: 


(10) Ro airigestar Marggíni X gilla Óchinn sein 
AUG-observe3s,»., Margineyoy servantyo, Óchinngg thatace 
*Margíne, the servant of Óchinn observed that. (LL line 21149 [Prose 
Dindsenchas]; author's trans.) 


A probable instance of the plural is:^ 


(11) do-ratsat sain uile n-óg 
PV-givess pw; thatacc allnomsc.neur "^ completeyo sc eur 
buidni Banba cen bithbrón 
*The hosts of Banba, free from enduring sorrow, gave all these completely 
[as pledges].’ (LL line 25233; trans. MD 3: 11)" 


15 The preceding two verses consist of a list of what was given in pledge, e.g. Eich claidib... 
gai sceith ‘horses, swords, spears, shields’, and it seems unlikely that these are being referred 
to collectively by a singular sain. 

16 I take uile n-óg as an adverb (lit. ‘completely and entirely’), which probably goes back to an 
Old Irish neuter substantive uile followed by the nasalised adjective (lit. 'the complete whole"). A 
further Middle Irish example is De sin ro ort uile n-6g. / ind énlaith olc ecalmór (LL line 20219), 
translated 'Thereupon he slew them all entirely, the evil formidable fowls', in MD 3: 259. 
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2.3 Prepositions 


GOI (§ 480) notes that 


any of the pronouns of 88 478, 479 may be used after a conjugated preposition which is 
introduced by the copula. Examples: is do in so ‘it is for this’ (Wb. 27d20); is airi in sin ‘it 
is therefore’ (Sg. 213a1); and often is samlid in sin or sin ‘it is like that’ [. . .] But where 
there is no periphrasis, such combinations are still rare — e.g. fuiri sidi (instead of for 
suidi) (Sg. 19925), ant sin (for i-sin) (Ml. 356a1) — although later they become common." 


There are, then, two types: 
(a) Simple preposition - stressed sin, e.g. ar sin, fri sin, íar sin. 
(b) Prepositional pronoun + sin, e.g. and sin. 


A metrical example of (a) is: 


(12) ba sruith gruad ro ruid i sin, 
COP3sc.prer Venerable&o. s; co: CheeKyoy, AUG-reddenag¢ prey in thatpar 
fri náimtea co n-aithisib. 

‘Venerable was the cheek that reddened thereat, facing insulting ene- 
mies.’ (Blathm. verse 122)? 


The fact that there are no distinctive plural forms for either sin or s6 means that 
type (a) can only be singular. 

While plural forms of type (b), such as dib sin, are well attested in Middle 
Irish, I have so far found no example in any early Old Irish text. The earliest 
example of this type I have is from the late Old Irish text Immram Curaig 446/6 
Duin: Bá leis trá búaid cech cluchi díb sin (LU line 1671 [hand M]), ‘He then was 
the winner in every one of those games’, although even this is in a manuscript 
of the Middle Irish period, and is not confirmed metrically. 

In this type also, the demonstrative was stressed. All the metrically con- 
firmed examples I have are Middle Irish. It bears repeating, however, that mod- 
ern editions are inconsistent in distinguishing stressed forms from enclitics. 

Examples with the singular are (the second part of the rhyming pair in 14 
and 15 is in square brackets here and elsewhere):? 


17 A reader notes also the prepositionless dative sin as the object of comparison after compa- 
ratives, e.g. nand maa sin a brig ‘that it is of no more account than that’, Sg. 150°1, sim. 15055. 
18 Itake the word-division in the edition (ro-ruidi sin) to be a slip. 

19 Both the position of sin, etc. at the end of a line, and rind ocus airdrind rhyme confirm that 
in every one of these examples we have to do with a separate stressed word. 
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(13) dena lagnib tuctha and sin. 
from-thepar.p, Spearp,rp, bYiNgs3pyprer.pass iM3sc.neur.par thatpar 
dé atat Lagin for Lagnib 
‘From the spears that were brought in that time, hence the Laigin are so 
called.’ 
(LL lines 21057-21058 [Prose Dindsenchas]; author's trans.) 


(14) a. and sin |: mórneim] (LL line 26995 [Metrical Dindsenchas]) 
b. and sin [: milid] (LL line 25712 [Metrical Dindsenchas]) 
c. and sain [: Alpain] (LL line 27837 [Metrical Dindsenchas]) 


(15) coistid riss sein [: Taltein] 
listen mv tO3se.neur.acc thatacc 
‘Listen to that!’ (LL line 27775 [Metrical Dindsenchas]) 


Examples with the plural are:?? 


(16) a. Nochor bruthi bir dib sein. 
NEG-AUG-cook35¢,prer.pass SPitnom from3p, thatpar 
in trath tucait 6n tenid 
‘Not a spit of those was cooked, when they were taken from the fire.’ 
(LL line 29245; author's trans.)?! 
b. ní dib sein |: tromneim] 
something froms, thatpar 
*any one of those things' (LL line 26846 [Metrical Dindsenchas]) 


2.4 Genitive 


Unlike the other cases of the demonstrative pronoun, the genitive will have a 
noun preceding it, and this noun is usually preceded by a possessive pronoun, 
coreferential with the demonstrative, i.e. of the type a fius sin (Wb. 10°27), 
‘knowledge of that’, and the demonstrative is stressed.” A careful distinction 


20 Further examples, with animate referents, are cited below. 

21 The edition prints díbsein, as one word. 

22 A reader notes instances in the Milan Glosses without a possessive in the case of the nominal 
prepositions i ndiad and i ndigaid ‘after’ (GOI 88 858, 859), viz. indiadsin, 65712 (glossing proinde), 
758 (glossing proinde), 96°13 (glossing hinc), indiadsin, 20°4 (glossing sed etiam; sic manuscript, 
but emended to innadiadsin, Thes. 2: 29), and indigaidsin, 71°11 (glossing proinde). 
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must then be made between two syntagms in which a demonstrative follows a 
noun, viz. the type in lebor-sin ‘that book’, with preceding article and enclitic 
adjectival demonstrative, and the type a lebor sin ‘the book of that one’, with 
preceding possessive pronoun and following stressed demonstrative pronoun. 

The only metrically confirmed example I have so far from an Old Irish text 
is with sé, cited below in (43). Neverthelesss, another indication that the de- 
monstrative is stressed is that the form with in can be found in this position, as 
in the following passage from the Old Irish Glossing of Senchas Már:? 


(17) Somuine bech . lestur lulaice, ian oil làn di mellit 7 dà thartine dec, 
a  lleth in sin ar lestar  colpthaige, 
her “™halfyom the; s; that; for hive,cc two.year.old.heifergey 
a trian ar lestar ndairte 
"Interest on bees, i.e. for a milch-cow hive, a pail of an ól-measure full of 
hydromel, and twelve small loaves; half of that for a two-year-old-heifer 
hive; a third of it for a yearling-heifer hive.’ (CIH 920.32; author's trans.) 


Similarly, the interposition of ám dóib-sium and dam-sa shows that sain is a 
separate stressed word in these two Middle Irish examples: 


(18) Fail a mórabba ám dóib-sium sain 
Deasc ears its great.causeacc indeed to4,723PL that; 
‘They have indeed good cause for that.’ (LL line 12066 [Táin Bó Cúailnge]; 
author's trans.) 


(19) Fail a morabba dam-sa sain 
be3cc.pres its great.cauSeacc 1016156 thateen 
‘I have good cause for that.’ (LL line 22899 [Cath Ruis na Rig]; author's 
trans.) 


Metrically confirmed examples, however, are plentiful in Middle Irish; cf.: 


(20) At-chüala co ngili gné. 
dà dam Dile derscaigthe. 


23 For this text see Breatnach (2005: 338-346). 
24 For the units of measurement used here see Kelly (1997: 578-580), and for mellit ‘hydro- 
mel’ (1997: 113). 
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Fe 7 Men fria ngairm sein 

Feyo, and Menyoy to-their ““*callingacc thatgey 

6 fail ainm ar Maig Femin 

‘I have heard of the two oxen of Dil, radiant of beauty, conspicuous; Fe 
and Men are they called, whence Mag Femin gets its name.’ (LL lines 
27259-27262; trans. MD 3: 199) 


(21) rap ferr léo ná [al silliud ^ sain 
AUG-COP 356. aer ^" better యము than her lookingyoy డం 
a tabairt beo fon talmain 
their puttingyoy aliveyow sc under-the,c. sc.wasc earth,cc 


‘Sooner than look upon her they had chosen to be buried under earth 
alive.’ (LL line 29927; trans. MD 4: 141)” 


(22 a mac samla sain [: génair] 
his sonyo, likeness,,, తం 
‘his match’ (SR line 5367; author's trans.) 


(23) Is fo samla sain sunna 
COP3s¢.przs under-her likeness,« that; here 
‘It was after her likeness in this place.’ (LL line 21473; author's trans.) 


3 The demonstrative só, sé, in Old and Middle 
Irish 


As in the case of sin, both 56 and sé can be preceded by in (GOI § 478), but un- 
like sin, one of the variants is correlated with case inflexion. For some com- 
ments on the apparently free variation between sé and só, see Stifter (2015: 
93-94)." The form síu, however, is found only in the dative, either with prepo- 
sitions or as an independent dative. 


25 I supply in brackets the a found in two other copies (cf. MD 4: 141). 

26 The diplomatic edition prints samlasain, while MD 1: 10.61 reads fon samla-sin, with the 
article rather than the possessive and enclitic -sin, in spite of the internal rhyme with calma in 
the following line. 

27 On sé in the poems of Blathmac, see also Uhlich (2018: 64-67). 
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3.1 Nominative 
Examples of in sé, sé and 56, respectively, as the subject of a (passive) verb are: 


(24) Is cian do-rairngred in sé 
COP 3s¢.pres LONSNom.sc.neur PV-prophesy,uc.3sc.prer.pass theyom.sc thisyom 
no mbithe int áugaire. 
‘Long has this been prophesied: that the shepherd would be struck down.’ 
(Blathm. verse 127) 


(25) ro-comallnad uile sé 
AUG-fulfillssc.prer.pass allow sour thisyo 
inge mod a thuidechtae. 
‘All this has been fulfilled save the act of his [second] coming.’ 
(Blathm. verse 233) 


Another example is found in the late Old Irish Immram Curaig Máele Duin: 


(26) In dün-ni fo-rrácbad sõ 
INT-COP5sc.pres- fOrip=1PL ^ PV-leaveguc.3sc.prer.pass thiSyom 
ol Máel Duin frisin cat 
“Ts it for us that this was left?”, said Máel Dúin to the cat.’ (LU line 1714 
[hand M]; author's trans.) 


Early examples with the copula are:?? 


(27) Ni réid la céill mbuirp 
NEG-COPasc pages @€SYnom.sc.neur With sense,c; "uncouth,cc sc rev 
in se. 
theyomse thisyom 


‘This is not easy to the uncouth intelligence.’ (Blathm. verse 159)? 


28 A reader notes further examples in copula sentences in the OIr Glosses, viz. Ml. 24°4, 6127, 
706, 1044, 1229, 130°16; Sg. 203716 (all with it hé in so); Ml. 86°3, 1151 (both with it hé in sé); 
Sg. 4°12 (with it hé sé); Carlsruhe Bede 32°8 (Thes. 2: 19, with it. . . in so); Sg. 104^1, 14812 
(both in 50, with zero copula). 

29 Further examples with in sé are in Blathm. verses 187 and 237. 
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(28) reic Crist, ba drochcundrad sé. 
sellingyom Christg;; COP3sc.rrer bad.contractyomsc thisyom 
‘selling Christ!—an evil bargain this’ (Blathm. verse 108)*° 


(29) Níbu for talam a dú; 
anni as firiu 
iss ed a ngein cruchae in so 
COPasc pas it thenomsc.neur “*birthyom CrOSScey theyomsc thiSsow 
ro buí re ndíliu. 
“The earth is not the proper place for him: rather is this the being destined 
for the cross who has been before the Flood.’ (IrGospThomas verse. 33)?! 


Another example is found in the late Old Irish Immram Curaig Máele Düin: 


(30) immafoacht dó cía | mulend so 
PV-3SGmasc'’aSKssc.prer tO3s¢.masc What millo, thiSyom 
‘He asked him “what mill is this?” (LU line 1757 [hand M]; author's trans.) 


Two examples of the plural are: 


(31) Derb batar é gnímae sē 
certaiNyomsc.neur COPspi pm; they deedyoy 5 అక్కయ! 
do maic máir maiss, a Maire. 
‘It is certain that these were the deeds of your great beautiful son, Mary.’ 
(Blathm. verse 41) 


(32) IT e in so dano freptai inna 
COPa, pres they theyomer thisyom.e. also remedyyow.er thecen.sc.rem. 
santi... 
avaricecgy 


‘These again are the remedies against avarice . . .', (Gwynn 1914: 154—155, § 1e) 


30 Further examples with sé are in Blathm. verses 20, 140 and 208. 
31 A further example with in só is in Gwynn (1914: 166.7). 
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3.2 Accusative 


Examples of the accusative as object of a verb are: 


(33) 


(34) 


(35) 


(36) 


In fer ad-chudid in sé 

theyomsc.masc MaNyom.so PV-"relateguc.ssc.prer thegccsc thiSacc 

is oen a thecht torise. 

‘The one who has related this is one of his faithful messengers.’ (Blathm. 
verse 225) 


‘Mar huath’, ol in tuath, 

‘do mac do-gni sé; 

your SONxom PV-dO3s¢.pres_ thiSacc 

nicon cualamar co 50 

nach macán am-ne.' 

“A great terror", said the people, “is your son who does this thing; until 
now we never heard of any such little boy.” (IrGospThomas verse 18)? 


as-ber nicon dergénus in so nó a 
PV-say3sc.pres NEG-dOguc.sc.rrer theacc.sg thiSace or theaccsc.neur 
n-i aill. 

one otheracc.sc.neur 

‘who says: “I did not do this or that.”’ (Gwynn 1914: 160-161 § 23) 


Late Old Irish 

is airi do-gniu-sa SO... 

COP 3s¢.pres fOF3sc.neur.acc PV-Oisc.pres=1SG this,cc 

‘the reason I do this is. . .' (LU line 1930, hand H [Immram Curaig Máele 
Düin]; author's trans.) 


3.3 Prepositions 


The situation regarding the demonstrative meaning ‘this’ is somewhat different 
to that of sin (above in subsection 2.3). The second type, prepositional pronoun + 


32 A further example with sé is in IrGospThomas verse 44. 
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stressed demonstrative, seems to be rare, and the only example I have from an 
Old Irish text is in the Old Irish Glossing of Senchas Már: 


(37) Is airi so 
COPssc. pres fOT35¢.neuT.acc thisacc 
ni tiagat dala huine i n-aile acht fri dithim. . . 
‘It is for this reason that matters proper to distraint with a stay of one day 
do not merge with those proper to distraint with a stay of two days, except 
in the case of delay in pound.’ (CIH 885.5; author's trans.)?? 


Even in the plural, forms such as dib so (CIH 1662.36, 1701.20 [sic leg.]) ‘of 
these’ do not appear to be attested in manuscripts of the Middle Irish period.” 

As for the first type, viz. simple preposition + stressed demonstrative, a dis- 
tinction is made between accusative sé, as in ar Sé, fri sé, etc., and dative siu, as 
in íar síu, de stu. An early rhyming example of the accusative form is co sé [: 
gne] (LU lines 4576-4577) ‘up to this’ (Táin Bó Cúailnge). As for the dative 
form, the long diphthong iu is confirmed by üaitne ‘consonance’ with céo and 
mbéo in:?° 


(38) Rom-snddat de síu 
AUG-1SG- "protect3p.pres.susy from thiSpar 
ar demnaib na céo, 
céili Maic ind Rig 
a tírib na mbéo. 
‘From here may they protect me against the fog-surrounded demons, 
these companions of the King’s Son from the lands of the living.’ (Murphy 
1956: 26-27 verse 16)?” 


33 For the legal procedure in question here see Kelly (1988: 177-179). 

34 That is, MSS written before 1200. The majority of the examples with a prepositional pro- 
noun given in DIL (S 307.5-17) are from Early Modern Irish texts. 

35 Further examples are given in Breatnach (2003: 138). Contrast co só in verse 18 of the Irish 
Gospel of Thomas cited just above. 

36 Cf. also síu ‘here’, without a preceding preposition, making üaitne with dó : fó (LL lines 
4816-4819), in the poem Fothairt for trebaib Con Corb, as well as the spelling de siu (LU line 
1731 [hand MJ), ‘from this side’. 

37 In the citation I have removed the hyphen in Murphy’s de-siu, to emphasise that siu is not 
enclitic. 
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A further variation is that sund can be used in place of siu in the dative singu- 
lar, as in the following selection of examples from Senchas Már: 


(39) Is i sund 
COP3sc.pres in thiSpar 
con-árrachta in dá recht. 
‘It is in this that the two laws have been bound together.’ (Breatnach 
2017a: 32-33 830) 


(40) Is for sund 
6036ను 01 this, 
ro suidigthea bechbretha la Féniu 
‘It is on this that bee-judgments have been established in Irish law.’ 
(Charles-Edwards and Kelly 1983: 88-89 8 55) 


(41) Is for sund 
COP 356, pres on thiSpar 
ro suidiged coibnius uisci t[h]airidne la Féniu. 
‘Itis on the foregoing [rules] that the kinship of conducted water has been 
established in Irish law’ (Binchy 1955: 72-73 § 15). 


(42) conid-n-oiscfe di sunn. 
PV-3SGyasct ~ alterssc.rur from thiSpar 
‘he who shall alter it from this’ (Binchy 1966: 46—47 § 37). 


This type became rare in Middle Irish; thus, for example, the only instance I 


have from the extensive body of verse that comprises the metrical DindSenchas 
is 6 Sun immach (LL line 26633 [MD 3: 152.4]). 


3.4 Genitive 


Although examples have not been easy to come by, the following Old Irish 
instance has, as in the case of sin, a possessive pronoun coreferential with 
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the demonstrative. Its position at the end of a line, and the rind ocus airdrind 
rhyme confirm that we have to do with a separate stressed word sé: 


(43) Is ed a etarcnae sé 
COP3cc.przs it its significanceyoy thisgry 
mac ron-ucais, a Maire, 
bid flaith cen tosach—cain n-ell!— 
ocus flaith cen nach forcenn. 
‘This is what this signifies: the son you have borne, Mary, will be lord with- 
out beginning (fair time!) and lord without any end.’ (Blathm. verse 190) 


3.5 Stressed séo 


While enclitic -seo is attested from the Milan 0105569 the stressed form 560 is 
not attested in the Old Irish glosses. Some Middle Irish examples are: 


(44) Cend Guill seo at-chi im laim 
headyom 601104 thisyom PV: ™S€ezsc.pres in-my handy, 
a Laech 
oh Láegyoc 


‘This is the head of Goll which you see in my hand, o Láeg.' (LL line 
12726; author's trans.) 


(45) Rop hé seo Druim nElgga  n-oll 
AUG=COP3¢¢.prep he thisyoy Druim ""Elgga “Sgreatyom.sc.neut 
‘This hill was known as great Druim Elga.’ (LL line 27297; trans. MD 
4: 337) 


(46) conid de seo bias Uisnech 
so.that-COP3ss purs from thispar be3sc.FUT.REL Uisnechyom 
‘and hence shall Uisnech be named?’ (LL line 27637; trans. MD 2: 45.44) 


The evidence surveyed thus far indicates that the demonstratives sé / só and sin 
were usually singular in Old Irish, and accordingly that plural forms would be 
expressed by means of the deictic particle i, preceded by the article and fol- 
lowed by the demonstratives sin and síu, on which see further below. 


38 See GOI (8 475) and Schrijver (1997b: 18). 
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4 Demonstratives with animate referents 


According to Pedersen (1909-1913, 2: 186) the demonstratives sin and sé, etc., 
were only used with inanimate reference in Old Irish: “Die substantivischen 
Gruppen in-so (in MI. auch in-se. . .) und in-sin haben nur neutrale Bedeutung 
(*dies", *jenes") [The substantive groups in-so (in Ml. also in-se . . . ) and in-sin 
only have a neuter meaning (dies ‘this’ (neuter), jenes ‘that’ (neuter))]." While 
examples with inanimate referents are plentiful in Old Irish, there are neverthe- 
less some instances with animate referents, although they are not very com- 
mon. All those I have noted are as the subject of copula: 


(47) conid e epscop in sin 
so.that-COP3s¢.pres he bishopyom thewowsc; thatyow 
citaru oirtned la Laigniu. 
*so that he is the bishop who has been first consecrated in Leinster' (Thes. 
2: 241.15) 


(48) Is é remibí bóairechaib in sin 
COP3sc.prrs he PV-BE3sc.pres.nan bÓairenarp, theyomsg thatyow 
‘that is one who takes precedence over other bóaires' (Binchy 1941: 
10.248)? 


(49) Sich in suí Sacharias: 
‘Amrae mac in SO; 
wonderfulyom.sc.masc DOYNom theyomsc thisyom 
ma for-cantae bed amrae 
fri sodain 22-720.” 
‘Said the sage Zacharias: “This is a wonderful boy; were he to be taught 
he would be more wonderful still."' (IrGospThomas verse 22) 


The referent can also be plural, as in: 


(50) It e mna in so 
COPa,, pres they WOMAaNyomr, theyoy p; thiSyoy pr 
na dlegut log n-eneach 


39 From Crith Gablach; further examples, all with in sin, are at lines 280, 350, 448, 459, 475 
and 593. 
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‘These are women who are not entitled to honour-price.’ (CIH 538.19 
[Senchas Mar]; author's trans.)^? 


4.1 Animate uses of sin 


In Middle Irish, however, examples are much easier to come by, and in what 
follows I separate the examples of sin from those of sé, 56. 


4.1.1 


(51) 


(52) 


(53) 


4.1.2 


(54) 


Nominative singular sin as subject of copula 


‘Can don mnai?' ar cach. 

‘Mathair Branduib “in sin’, ar Aedan. 

motheryo, Brandubggy theyom.sc thatyoy 

“Whence is the woman?’, said all. “That is Brandub's mother", said 
Aedan.’ 

(Meyer 1899: 135, 137 § 9) 


mac sin Bressail Bélaich bind. 

SONyom thatyom Bresalgey Belachgen.sc.masc Melodiousgen.sc.masc 

‘The latter was the son of melodious Bresal Bélach.’ (O’Brien 1952: 161, 167 
verse 12c) 


Düalderg ingen Mairge Móir, 

ben sein Smucailli meic Smoil, 

Wwifeyo, thatyom Smucaille;4; song; 505616 

*Düalderg, daughter of Marg the Great, she was the wife of Smucaille, son 
of Smól.’ (LL line 28898; trans. Ó Murchadha 2009: 23 verse 70) 


sin as object of a transitive verb 


Fergus Lethderg ro slait sein [: anmain] 
Fergusyom Side.tedyomsc.masc AUG:Spoilgs; prey that4cc 
‘It was Fergus Red-side that spoiled her.’ (LL line 29020; trans. MD 4: 251) 


40 Similarly, CIH 43.10. 


5 The demonstrative pronouns in Old and Middle Irish — 133 


(55) gabaid ^ sin ol se 7 berid a  chend de 
takes py that,«; says he and bring wev his "head, fromsce.masc 
“Take hold of that person," said he, “and remove his head from him.” 
(Atkinson 1887: 643 [RIA MS 23 P 16 (Leabhar Breac) folio 9°9]; author’s 
trans.) 


4.1.3 Nominative plural sin as subject of copula 


(56) Deich meic sin do Chathair chrüaid 
ten  SONyom.pı thatyow of “Cathaír, ‘sternparsc.masc 
*Those are the ten sons of stern Cathair' (LL line 26022; trans. MD 4: 285) 


4.1.4 Nominative plural sin as subject of a passive verb 


(57) Na torothair dano techtait dà chorp i n-óenaccomol 
deligfitir sin tall isind eséirgi 
separates» curpass thatyoy.p, beyond in-thepar.sc.rem resurrectionpar 
‘The monsters also, that have two bodies in one union, they will be sepa- 
rated beyond in the Resurrection.’ (LU line 2562 [hand H] [Scéla na 
Esérgi]; trans. Stokes 1904: 239). 


(58) cethri sessir garga a ngluind 
ro marbtha sin la Drecuinn 
AUG Kill: prerpass thatyom.e. by 11660606 
‘Four times six—fierce their deeds! these were slain by Dreco.’ (LL line 
30502; trans. MD 4: 15) 


(59) ro slechta na sechtaib ^ sain 
AUG:slaughters,; prer.pass in-their seVeNparp, thatyow p; 
‘They were slain in their sevens.’ (LL line 26094; trans. MD 3: 99) 


(60) is dia réir ra seolta sain. 
COP3sc.pres to-their will,,; AUG-send3p, puer pass thatyoy pr 
géill na Eurpa co Crüachain 
‘in express submission to them have been sent hostages from all Europe 
to Cruachu' (LL lines 20690-20691; trans. MD 3: 348) 
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4.1.5 Nominative plural sin as subject of an intransitive verb 


(61) Is aire condeochatar ^ sin i 
(01362: forssc.neur.acc that-Come3p,prer thatyom.p. in 
comdail Con Culaind 
meetingrccsg Cüggy Culanntiggn 
‘The reason they came to encounter Cá Chulainn was...’ (LL line 8796 


[Táin Bó Cüailnge]; author's trans.) 


(62) Ro scáchatar sin uile 
AUG-departs, prer thatyow;pi allyou.pi asc 
nocho mair dib oenduine ‘All those have departed; not a single one of 
them remains.' (Meyer 1912: 218 verse 23) 


4.1.6 Plural sin after a prepositional pronoun 


(63) Cid  üadib sain no gairthe. 
even fromsy thatparp, PV-callsss pucr pass 
eter slüagaib samaigthe 
‘Even from them it was called among leaguered hosts.’ (LL lines 
25401-25402; trans. MD 3: 23)^ 


(64) rí díb sin |: Femin] 
kingyo, Ofsp, అడక! 
*a king of those' (LL line 29807 [Metrical Dindsenchas]) 


(65) cid atai dóib sin beus 
what PV.:bex aes toj that. still 
‘Why are you still angry with them?’ (LL line 8367; author’s trans.) 


4.2 Animate uses of sé/só 


I give next plural forms of sé, só; the singular forms are included in the final 
section of this paper. 


41 The internal rhyme üadib : slúagaib establishes that sain is a separate word. 
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4.2.1 Nominative plural s6 as subject of a transitive verb 


(66) ar cech n-omgnim gniset so 
for everVacc.sc.masc "^ Crueldeed,cc dOspr.prer thiSyom.rr 
sniset a comlín chucco 
*For every cruel deed they did, they [the Tuatha Dé] inflicted the like num- 
ber upon them.' (LL line 25163; trans. MD 3: 5) 


4.2.2 Nominative plural sé/só as subject of copula 


(67) it iat in 50 rig na 
COP 3p1. pres they theyomer — thiSyou ei kingyoy ei theo pi NEUT 
cóiced batar acond feis-sin 


provincegey.e, De3pr.prerre. at-thepar.sc.rem ““feastpar=DIST 
‘These are the provincial kings who were at that feast.’ (LL line 37651 
[Bórama]; author's trans.) 


(68) Coic rig coicat sdethralilch se 
five kingyomer fiftYcen laboriousyomer.masc thiSwow.pr 
do laechraid na Cristaide 
‘Five and fifty kings—laborious these!—of the  warriorhood of 
Christendom’ (LL line 25209; trans. MD 3: 9) 


4.2.3 Genitive plural so 


Two examples of the genitive plural in Middle Irish commentary on 
Senchas Mar are: 


(69) a. 7 fo coruib so uili teacar 
and under-their contractparp, చకా! all; p; Asc COMEC3sc.pres.pass 
*And it is the contracts of all of these that are impugned.' (CIH 1794.15; 
author's trans.) 

b. Tecur fo coruib 50 575 

COME35¢.pres.pass UNder-their contract, si. thisgen.p, below 
‘The contracts of all of these below are impugned.’ (CIH 1833.30; au- 
thor’s trans.) 
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5 Demonstratives with the deictic particle í 


The deictic particle í followed by a demonstrative can qualify a noun or, combined 
with the article alone, can be used as a substantive. The former type is discussed in 
GOI (8 475.2), where the examples are punctuated in fer hi-siu, in fer hi-sin, etc., and 
the latter in GOI (8 476), with the punctuation int-i-siu, ind-i-siu, an-i-siu, etc. An 
immediate problem with this interpretation of the demonstratives as enclitic is why 
the form for 'this' should be -siu, when the enclitic forms otherwise are -so, -sa, -se, 
-seo and -sea. In actual fact, there is enough metrical evidence to confirm that both 
i and sin are stressed in this combination. If the word for ‘that’ is stressed, so also 
must the word for ‘this’, and accordingly the spelling siu is to be read with a long 
diphthong, viz. síu, the (independent) dative singular form of sé, só. 
A metrical example which confirms that í is a stressed word is: 


(70) m chroch hí as-mbeirid-si 
nos rega int í 
PV-3SG rem'ZO3sc.rur thewowsc.wAsc ONE 
doda-rodcht do ráith cháich 
do thaithchreic cach bí. 
‘That cross you speak of, he will suffer it who has come to it for the sake 
of all to redeem every living creature.' (IrGospThomas verse 39) 


I have one instance from an Old Irish text, and two from Middle Irish texts (the 
first of these is early Middle Irish), where its position at the end of a line and 
rhyme confirm that sin following f is a separate stressed word: 


(71) bed Ísu ainm ind i sin, 
COP3scimpv Jesus nameyo, thegensc.masc One thatparse 
don domun bid sldinicith. 
‘Let Jesus be his name, he will be the saviour of the world.’ (Blathm. verse 


155) 

(72) abuir fri Maol a n-í sin. 
Speakəsc.mev to Maelacc theaccsc.neur "^ one thatparsc 
a oghriar ó Aodh a fhir 


‘Tell that to Máel, [he will have] all he wishes for from Áed, o man.’ 
(Byrne 1908: 70.14; author's trans.) 


42 With full rhyme (deibide nguilbnech) between sin and fhir. 
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(73) Airigis Tadc a nni sin 
observessc.rrer Tadg theaccsc.neur "^ one thatparse 
eatha üad co Mag Femin 
cosin scél-sin a Leath Chuind 
co sil nAililla nUlaim 
‘Tadg observed that; messengers were sent by him to Mag Femin, with 
that story from Conn’s Half to the descendants of Ailill Ulam.’ (RIA, MS 23 
P 2 [Book of Lecan] folio 22210; author's trans.) “? 


6 Middle lrish analytic forms of the verb 


In a discussion of the rise of the use of independent pronouns to mark both the 
subject and the object of verbs in Middle Irish, and the origin of the pairs sé/é, 
si/i, etc., in the third person forms, where Old Irish had only one form (é, sí, 
etc.), Greene (1958: 111) remarked: “Probably the forms ol sé, ol sí (which cer- 
tainly had fully stressed pronouns by this time, whatever the situation may 
have been in Old Irish) also contributed to the new development; it was cer- 
tainly they which determined that the s- forms of the third person pronouns 
should be used as subjects immediately following active verbs." ^* 

In ol sé, Middle Irish ar sé ‘inquit’, the sé most likely was historically the 
demonstrative, as Quin (1960) argued, although following the then current un- 
derstanding of the form in question as se, with a short vowel. Nevertheless, by 
the Middle Irish period it had been assimilated to the personal pronoun, as can 
be seen from the following two lines in Saltair na Rann, where sé is used to 
mark direct speech by Adam, but when direct speech by Eve is reported, sí is 
used:^? 


(74) ar sé, ar Adam, fia |dagmnai 
says hevow says Adamyoy to-his good.wife,c. 
*said he, said Adam, to his good wife' (SR line 1306; author's trans.) 


(75) ar si ar Eua fri Adam 
says Sheyom says Eveyom to AdamMace 
*said she, said Eve, to Adam' (SR line 1942; author's trans.) 


43 From a poem in the account of the battle of Crinna. 
44 See also Roma (2000b). 
45 Cf. Mac Cana (1984). 
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Some years ago, I showed that the form of the stressed demonstrative pronoun 
is in fact sé, with a long vowel, and suggested that this Old Irish word had 
some role to play in the development of the homonymous independent pro- 
noun sé ‘he’ beside é in Middle Irish (Breatnach 2003: 140). Furthermore, the 
gradual disappearance of sé as a demonstrative in Middle Irish may well indi- 
cate a shift in function.^ó Interestingly, there are quite a few instances in 
Middle Irish texts of ambiguity in the case of sé, that is, where it is not entirely 
certain whether we have to do with the demonstrative pronoun or with the 
third singular masculine pronoun. 

Given that the development of the pairs sé/é, sí/í, etc. must have taken 
some time, it is more likely that the first four examples below, taken from texts 
belonging to the late Old Irish /early Middle Irish period, are of the demonstra- 
tive pronoun, although, at the same time, it is not difficult to see them as equiv- 
alent to the pronoun é. 


(76) Ruire echtach Eassa Rüaidh, 
immo tteccraitís morsliaigh, 
ass-ib digh mbáis baeghlach sé, 
PV-drink3s¢.prer drink,cc “death, dangerousyow sc rur thisSyom 
iar ccradh ui Iese. 
‘The great-deeded chieftain of Eas-Ruaidh, about whom great hosts used 
to assemble, he took a lethiferous drink dangerous truly, after persecuting 
the descendant of Jesse (i.e. Christ).' (O'Donovan 1856: s.a. 899)*” 


(77) Ni sé sním fil fom 
NEG-COP3sc.pres thiSyom troubleyoy be3sc.pres.re, UPON4pr, 
‘This is not what distresses us.’ (Mulchrone 1939: line 116, Stokes 1887: 11 
[Vita Tripartita]) 


(78) 7 ro mbaitsi Pátraic oc Sangul .i. sain aingel"? dodechoid día acallaim-sium 
alla sin 
7 ni sé Uictor 
and NEG-COP 36. pres thisyou VictOtyom 


46 Examples of co sé are occasionally found in Middle Irish and Classical Modern Irish (go sé); 
see Breatnach (2003: 138). 

47 With slightly altered punctuation and word-division, and the addition of macrons over 
long vowels. In the text ui Iese (leg. ui Iése) is glossed .i. Criost. 

48 This serves, of course, as an etymology of the place-name Sangul. 
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‘And Patrick baptised him at Sangal; that is a different angel went to con- 
verse with him on that day, and it is not Victor.’ (Mulchrone 1939: line 
2417, Stokes 1887: 207 [Vita Tripartita]) 


(79) Trí coicait laech... 
ba sé lucht línaib dindgna 
COP356.prer thisyom Contentyoy NUMberLpay.p, fortressgen. pi. 
cach imda de suidib 
‘Thrice fifty heroes ... that was the tale, according to the counts of for- 
tresses, in every chamber of the number.’ (LL lines 3565-8; trans. MD 1: 33)? 


(80) Secht cubait . . . 


ba sé tomus in tellaig. 
COP3sc.prer thisyom Measureyom thegen.sc.neur hearthgey 
‘Seven cubits .... that was the measure of the hearth’, (LL lines 


3573-3576; trans. MD 1: 32-33, lines 53-56) 


In examples from later Middle Irish texts, sé could be taken as the pronoun, 
used in positions where é subsequently came to be used after the distribution 
of sé and é was regularised, although some of those below could just as well be 
read as demonstratives:*° 


(81) ba sé iath con-atchetar 
COP3sc.prer thisyom landyom PV-askaue.zpr.pret 
‘that was the land they asked for’, (LL line 19708; trans. MD 3: 441)?! 


(82) ar cipé ti bid sé fot a Sdeguil 
for whoever comesse.pres.susy COP3s¢.rur thisyom lengthyoy his "life; 
‘for whoever so comes, that will be the length of his life’, (O’Rahilly 1967: 
180y-z [LL line 9081 (Táin Bó Cüailnge)]) 


49 This and the following example are from the poem Domun duthain a lainde, which al- 
though edited by Gwynn (MD 1: 28-37) as the fourth poem on Tara, is not part of the 
Dindsenchas proper in LL; the language is earlier than than of the Dindsenchas as a whole, 
either late Old Irish or early Middle Irish. 

50 For the forms with and without s-, the latter normally being used where the pronoun is the 
subject of the copula or a passive verb, and deviations from the norm, see Breatnach (1994a: 274). 
51 Note that Gwynn reads the variant, ba hed íath conaitchetar (MD 3: 440), with the third 
person neuter pronoun. 
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(83) Cocholl Manc|[h]in, cid mait se 
cowly,, Manchíngg, however-COPss;pggs good thisyom 
‘The cowl of Manchin, however good this is.’ (Meyer 1892: 129.1; author's 
trans.) 


(84) Ro cürad ro sedlad sé. 
AUG-chastise3g¢.prer.pass AUG-MaiMssc.prer.pass కళింగ 
ro dedlad ra dóenmige 
‘He was chastised, he was maimed, he was parted from his misery.’ 
(LL line 25708; trans. MD 3: 69) 


(85) Caelchéis diaro sernud sé 
Caelchéisyom when-AUG-dispose3go.prer.pass W@nom 
‘When Caelcheis was driven abroad’ (LL line 30203; trans. MD 3: 439)? 


I finish this collection with two examples of sé as subject, where not only is it 
separated from its verb (and thus é might be expected), but also the translation 
‘this’ is more appropriate than ‘it’: 


(86) Dia lod d'iarair mo leigis 
iar mbliadain rüin ro gabus 
rom chuir hi seirg seimne sé 
AUG-1sG-""putss;,»., in Wwastingacc ***5g s; thiSnom 
him-meirbe ocus him-migne. 
*When I went to seek my cure, after a year, I had kept a secret, which had 
thrown me into a wasting, into feebleness and into an evil state.' (Meyer 
1903: 48, 5286) 


(87) conos tuc i stianbds se 
so.that-3PL-bring3s¢.prer in sleep.deathacc thisyom 
céol ro chachain Craiphtine 
‘so that this brought them into a death-sleep, the music which Craiphtine 
played’ 
(Ó Cuiv 1966: 174) 


52 The fifth line of a six-line verse in the version of Aislinge Meic Con Glinne in Trinity College 
Dublin Manuscript H 3. 18; here se makes end-rhyme with dé and nglé. 
53 Note that Gwynn reads the variant si (MD 3: 438.13). 
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In conclusion, the simplest way to account for these examples as a whole is to 
take them as exemplifying sé ‘this’ in the course of a gradual shift from demon- 
strative pronoun to a personal pronoun in complementary distribution with ఈ 
Only in some cases, such as when sé is combined with sin, as in Ba sé sin büar 
Flidais (LU line 1632), ‘That was the cattle of Flidais’, is it clear that sé is the 
pronoun and not the demonstrative.” Similarly, we cannot be absolutely cer- 
tain of the existence of analytic forms of the verb until we find first and second 
person pronouns used as the subject of a verb, and in the case of the third per- 
son, a plural pronoun, a metrically confirmed stressed feminine singular sí, or 
a masculine singular hé used as the subject. All attestations of such forms are 
in the late 12th-century Book of Leinster.°° 

While this paper is by no means intended to be a comprehensive account of 
the demonstratives in Old and Middle Irish, I hope to have gone some way to- 
wards elucidating the phonology and range of use of sé/só and sin, and the de- 
velopment of the independent pronoun and the analytic forms of the verb in 
Middle Irish, as well as providing possible dating criteria for texts. I will end by 
stressing that in all such work it is essential to use all the means available, es- 
pecially metrics, to determine Old and Middle Irish forms, and not simply to 
assume that what holds for Modern Irish also held for the earlier period. 


54 This complementary distribution of two forms of separate origin, which differ only in the 
presence or absence of initial s-, will have formed the basis for the creation of the other pairs 
of forms of the independent pronoun, first in the third person forms, viz. si/i, sed/ed, and síat/ 
iat, and eventually in the first and second person plural forms, viz. sinn/inn, sib/ib, for which 
see Breatnach (1994a: 274, 429). 

55 That these are two separate stressed words is shown by the rhyme in se sein with nimib in 
SR lines 195-196. 

56 See Breatnach (1994a: 272-273 810.19) and Breatnach (2015: 72-73). 
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6 Paradigmatic split and merger: 
The descriptive and diachronic problem 
of Old Irish Class B infixed pronouns 


1 Introduction: Infixed pronouns and clause types 
in the Old Irish verbal complex 


Infixed pronouns are one of the formal strategies used in the Old Irish verbal 
complex to distinguish clause types, in such a manner that the sole formal op- 
position between Classes A/B and Class C serves to express the opposition be- 
tween declarative and relative clause type respectively in lexical compound 
verbs which take an infixed pronoun. 

The use of the Classes A and B of infixed pronouns, which mark declarative 
clause type, is determined by the phonotactic structure of the (first) lexical pre- 
verb of the verbal compound appearing in the pretonic part of the verbal com- 
plex. The general rule is that lexical preverbs which end with a consonant 
(henceforth also (-)VC- lexical preverbs), with the exception of imm- and ar-, 
take Class B infixes to express declarative clause type. For instance, the Old 
Irish verb as-beir ‘(s)he says’, with the (-)VC- preverb as-, takes the Class B third 
person singular neuter infixed form -t.-, so that a(s)- + -t’- > at- in at-beir ‘(s)he 
says it’. By contrast, lexical preverbs which end with a vowel (henceforth also 
CV- lexical preverbs) make use of the so-called Class A of infixed pronouns, so 
that e.g. the lexical compound do-beir ‘(s)he brings, gives’, with the lexical pre- 
verb (to- >) do-, expresses the same third person singular neuter pronominal 
reference by substituting the -o- vowel of the lexical preverb by the correspond- 
ing Class A infixed form -al-, i.e. d(o)- + -a"- > da-beir ‘(s)he brings, gives it’. 
The lexical preverbs imm- and ar- also make use of Class A of infixed pronouns. 

The relative forms which minimally contrast in clause type with the above 
mentioned declarative forms at-beir ‘(s)he says it’ and da-beir ‘(s)he brings, 
gives it’ are ass-id-beir ‘who says it’ and do-d-beir ‘who gives it’ respectively, 
which include the mentioned Class C forms of the third person neuter infixed 
pronoun (i.e. -[i]d*-). 

The ultimate aim of this study is to explain the diachronic origin of Class 
B of infixed pronouns, but, as a prerrequisite for this, and also as a point 
which is in itself worth discussing, the exact morphological and syntactic cir- 
cumstances of these infixed pronouns must also be investigated in the corpus 
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of the contemporaneous Old Irish glosses. This descriptive question can be 
briefly referred to as follows. The distinction between Classes A/B (for de- 
clarative clause type forms) and Class C (for relative clause type forms) is 
quite regularly made in Old Irish when a third person pronoun is infixed in 
the lexical compound. However, things are different with a first or second 
(henceforth also non-third) person pronominal infix. In the language of the 
contemporaneous Old Irish texts, non-third person infixed pronouns are 
much less regular in making that distinction between declarative and rela- 
tive form and show a very remarkable behaviour especially when the lexical 
preverb after which the infixed pronoun appears is of type (-)VC-, that is to 
say, when the declarative clause type infixed pronoun must be of Class B. In 
that situation, Class B is most often used in cases in which relative clause 
morphology (i.e. a Class C form) is expected. Though less frequently, non- 
third person Class A infixed pronouns also appear in cases in which relative 
morphology is expected. 

The descriptive question is then how to deal with this asymmetry in the 
use of Classes C and B depending on whether the involved infix is of a third 
or non-third person, and the position taken in this paper is that this situa- 
tion of asymmetry observed in the contemporaneous Old Irish texts is directly 
related to the question on the diachronic origin of the Class B of infixed 
pronouns. 

In order to answer this question, sections 2 to 4 provide a detailed descrip- 
tion of the situation of Class B pronouns as they are used in the Old Irish 
glosses. In particular, section 2 provides basic information about the Old Irish 
verbal complex and the category of clause typing expressed in it; section 3 pro- 
vides a list of lexical preverbs which take either Classes A or B of infixed pro- 
nouns, presents the whole paradigm of the three classes of infixed pronouns, 
and illustrates the use of Classes A/B instead of the expected Class C; section 4 
lists the forms attested in the three main collections of glosses which show a 
non-third person infixed pronoun after a pretonic lexical preverb of the (-)VC- 
type, that is to say, of verbs which must take Class B for the declarative clause 
type forms. On the basis of the previous description, section 5 gives a proper 
formulation for the diachronic question referred to above, discusses previous 
diachronic explanations, and provides some basic aspects regarding the ety- 
mology of the other classes of infixed pronouns, as well as of some lexical pre- 
verbs. Section 6 elaborates a diachronic explanation for the Old Irish Class B of 
infixed pronouns which is congruent with the previous description and which 
also provides a justification for this formal distinction in the infixed pronouns 
used for declarative clause type. Section 7 summarises the main points of the 


paper. 
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2 Some background about the Old Irish 
verbal complex 


The initial statement of this paper is, as stated above, that Classes A/B of in- 
fixed pronouns are used for declarative and Class C for relative clause types. 
This section only refers to two issues on these pronominal references. For more 
aspects of the Old Irish verbal complex, I refer to the treatment in Garcia- 
Castillero (2012, 2014, 2015, 2020). 

The first issue is that the Old Irish infixed pronouns are morphological ele- 
ments which always appear after a previous pretonic element, which may be a 
conjunct particle (i.e. a pretonic element of a grammatical nature), or a lexical pre- 
verb, which constitutes a semantic unit with the verbal stem (see Table 1 below on 
page 148). Handbook examples of the combination of a verb with conjunct par- 
ticles are ni-beir and nad-beir, from the simple beirid ‘brings’. Both being the third 
person singular of the present indicative active, the former is marked as a negative 
declarative clause type form (‘[s]he does not bring’), and the latter as a negative 
relative clause type form (‘who does not bring’ or ‘whom/which s/he does not 
bring’). The form nad-beir must be understood as including relative lenition (i.e. 
the change of underlying /b// to /v'/ producing /nad’vier'/), although this mutation 
is not graphically marked in Old Irish when it applies to voiced obstruents. 
Relative lenition involves the phonological fricativisation of the first consonant of 
the basic form of the verb, in this case a voiced bilabial plosive; the sound /f/ 
is deleted and /s/ becomes an aspiration; vowels are not affected by lenition. 
The other relative mutation used in the Old Irish verbal complex is the so- 
called relative nasalisation, which formally involves the addition of a nasal 
sound to a voiced plosive (i.e. nad-mbeir), or to a vowel, and the voicing of a 
basic voiceless plosive. The functional side of these two relative mutations 
does not need to be considered now. The important point for the use of Class 
B infixes is that it is only combined with the lexical preverbs of the type (-)VC- 
to be observed in the next section, whereas Class A is combined with lexical 
preverbs (of the type CV-) and conjunct particles; by themselves, these two 
classes express declarative clause type. Class C infixes, which express relative 
clause type, are combined with lexical preverbs of whichever phonotactic 
type and conjunct particles. 

The second issue is related to the two grammatical categories which cross- 
cut in the infixed pronoun, which must therefore be considered basically as a 
portmanteau morpheme expressing pronominal reference and clause type at 
the same time. The Old Irish verbal complex regularly distinguishes six clause 
types by means of several formal procedures. The two most important clause 
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types for this paper are the declarative and the relative (where the leniting and 
the nasalising variants mentioned must be included); in addition, the Old Irish 
verb distinguishes content (or wh-)interrogative, polar (or yes-no) interrogative, 
responsive and imperative clause type forms. Some of these clause types will be 
mentioned later, and are characterised by the use of one of the classes of pronom- 
inal infixes, as also detailed in the next section. As for the distinction between 
declarative and relative clause types, it must be stated that the formal opposition 
between Classes A/B and C is one of the formal strategies which suffice by them- 
selves to distinguish those two clause types in the Old Irish verbal complex, as 
illustrated in the examples of (1) of the next section. The other formal means are 
different sets of endings (the so-called absolute endings, where there are both de- 
clarative and relative absolute endings), the so-called relative mutations (which 
contrast with the lack of them), as well as special conjunct particles such as the 
negative declarative ní- and the negative relative nad- mentioned above. 


3 Formal and functional aspects of the Old Irish 
Classes A, B and C of pronominal infixes 


This section provides the basic descriptive tools to understand properly the prob- 
lems considered in this paper. Section 3.1 illustrates the basic distinction referred 
to in the previous section with attested forms including third person singular 
neuter infixed forms. In section 3.2 the whole set of the infixed pronouns used 
in the Old Irish verbal complex as well as the main issues of their use are in- 
troduced, paying special attention to the formal features of Classes B and C. 
Section 3.3 centres on the morphological process which must be assumed in the 
expression of the third person singular masculine / neuter of Class A infixed pro- 
nouns. Section 3.4 establishes the proper context in which the asymmetry be- 
tween third and non-third persons mentioned at the outset must be considered, 
and focuses on the use of Class A instead of expected Class C as a special case of 
the general phenomenon which involves the possibility of using either declara- 
tive or relative clause type marking in the same syntactic context. 


3.1 Basic functional distinction between Classes A/B and C 


Classes A and B basically mark the corresponding verbal complex as a declara- 
tive clause type verb, as already stated, and are also used in the imperative verb, 
which makes use of a partly different set of inflectional endings. The pronominal 
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infixes of Class C are used to mark relative clause type and some other subordi- 
nate clauses, and are also used in polar interrogative clause type forms, which 
are constitutively marked by the conjunct particle in™-. In lexical compounds 
which have no conjunct particle in the pretonic slot, i.e. which have a lexical pre- 
verb in the pretonic slot, this difference between Class A/B and Class C expresses 
by itself the difference between declarative and relative clause type respectively. 
The forms in (1) and (2) illustrate this clause type difference in minimal or quasi- 
minimal pairs of forms attested in the Old Irish glosses. 


(1) a. darigni 
PV-3SGypur(A)*dOguc.3sc.prer 
*(David) has done it.’ (MI. 5192) 

b. dudrigni 
PV-3SGygur(C)-dOquc.3sc.pret 
‘who has done it’ (MI. 124°3) 


(2 a. air atroilli dia 
for PV-3SGypur(B)-deservessg.prrs Godoy 
‘for God deserves it . . .' (MI. 51712) 
b. donaib hí — assidroillet 
tOzpr-theparp, one PV-3SGyzur(C)-deservesp: pres 
‘to those that deserve it’ (M1. 546) 


The forms in (1) are both based on the third person singluar perfect active do-r- 
igni ‘(s)he has done’, of do-gni ‘does, makes’, where do- is a CV- lexical preverb. 
In (1a), this form takes the third person singular neuter infix of Class A, whereas 
in (1b), it takes the corresponding Class C form. The two examples of (2) include 
forms from the verb ad-roilli ‘deserves’. In (2a), at-roilli has the Class B third 
person singular neuter infix and therefore counts as a declarative clause type 
form. Note, in the relative form assidroillet in (2b), the use of the preverb as- 
instead of the original ad-: as stated by Thurneysen (GOI § 822), this is due to 
the loss of formal distinctiveness between those preverbs as- and ad- when 
they are combined with Class B infixed pronouns, both with the form at-: cf. 
at-roilli [God] deserves it’ in (2a) and the form at-beir ‘(s)he says it’ quoted in 
the introduction, which belongs to the basic form as-beir ‘(s)he says’. Though 
this is not exactly the case of the loss of formal distinctivity assumed later on in 
this paper as the trigger of the creation of the Old Irish Class B of infixed pro- 
nouns, it illustrates how formal distinctions can be lost in the pretonic part of 
the verbal complex. 
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3.2 The Old Irish lexical preverbs and the distribution 
of Classes A/B 


As also anticipated at the outset, the phonotactic structure of the (first) lexical pre- 
verb of the basic lexical compound decides the shape of the infixed pronoun ex- 
pressing declarative clause type, i.e. whether Classes A or B are to be used. 
According to the description in GOI (88 411—412), the general rule is that, if the lexi- 
cal preverb ends in a vowel, i.e. if it has the shape CV-, then Class A is used; if the 
lexical preverb ends in a consonant, i.e. (-)VC- (except imm- and ar-, which bear 
Class A infixes, which originally had the shape CV-, i.e. which come from a form 
with original vocalic auslaut), then the infixed pronouns of Class B are used. 
Table 1 includes the most relevant lexical preverbs in the shape that they adopt in 
Old Irish in declarative clauses without infixed pronoun,’ and - for the (-)VC- 
preverbs - it also gives the assumed Primitive Irish form between parentheses and 
the basic form that they adopt in combination with a Class B infixed pronoun. 


Table 1: Old Irish lexical preverbs and their infix class in declarative 
clause type form. 


Lexical preverbs with Class A Lexical preverbs with Class B 


CV- (JVC- (JVC- 

ro- imm- ‘about’? (*kom-) con- ‘with’ > cot- 

(to >) do- ‘to’ ar- ‘for’ (*in[d]-  in(d)- ‘in’ > at- 

di/e- | do- ‘from’ (*ad-) ad- ‘to’ > at- 

fo- ‘under’ (*ath[i]-) ad- “re” > at- 
(*ess-) as- ‘out of 2 at- 


(*uss-)  as-'up,out? — »at- 
(*frith- fris- ‘towards’ > frit- 
(౧౧ for- ‘over’ > fort-/d- 
(*eter-) eter- ‘between’ > etart-/d- 


The meanings adduced must be understood as orientative and, to a great ex- 
tent, etymologically based. In not a few compounds, however, these meanings 
have been blurred, so that they are not anymore distinguished. 


1 Class B is also found in combination with the de-adjectival preverb mí- ‘badly, mis-’ in mit- 
nimret ‘that they deceive him’ (Ml. 74°22 = mi-t-Nimret, from the verb imm-beir ‘plays, han- 
dles’). See Garcia-Castillero (2014) for this type of preverbal element, which is much less 
frequent than the conjunct particles and lexical preverbs. 
2 For this semantic interpretation, see Russell (1988: 125). 
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Despite the fact that there are more preverbs which bear Class B, as it can 
be observed in Table 1, the ones taking Class A are more frequent, i.e. there are 
more verbal compounds with those lexical preverbs. In addition to that, whereas 
Class B is only used with the (-)VC- lexical preverbs, Class A is regularly used 
also with some very frequent conjunct particles such as the declarative negative 
particle ni- seen in the previous section, or the meaningless particle no-, one of 
whose main functions is precisely to provide a pretonic element after which the 
equally unstressed infixed pronoun can appear. 

The combination resulting from the preverb with the Class A infix consti- 
tutes a phonotactically adequate sequence in most forms of the corresponding 
paradigm. The preverbs imm- and ar-, to use the pretonic forms used in declara- 
tive verbal complexes, are characterised by the addition of a vowel after the 
final consonant of the form before the Class A pronominal form. These two fea- 
tures are illustrated in Table 2, which includes the CV- lexical preverbs to- and 
di/e- with Class A infixed pronouns, and the combination of the former with 
Class C, on the one hand, and the specific form of the preverb imm-, on the 
other. The (-)VC- lexical preverb included in Table 2 is con-, a type of preverb 
which ends with a nasal and in which the difference between Classes B and C is 
most conspicuous. 


Table 2: Some Old Irish lexical preverbs with infixed pronouns of Classes A, B and C. 


Class A Class B Class C 
to- di/e- imm- con- con- to- 

1sg. do-m-' do-m-' imm-um-' cotam-' condam-' do-dam-' 
2sg. do-t-' do-t-' imm-ut-" cotat-' condat-' do-dat-' 
3sg.masc. d(o)-a-N d(i/e)-a-N imm-a-" cot-" condid-" do-d-" 
3sg.neut. d(o)-a-' d(i/e)-a-' imm-a-t 60౬ condid-' do-d-' 
3sg.fem. do-s- do-s- imm-us- cota- conda- do-da- 
1pl. do-n- do-n- imm-un- cotan- condan- do-dan- 
2pl. do-b- do-b- imm-ub- cotab- condab- do-dab- 


3pl. do-s-" do-s- imm-us- cota-" conda-" do-da-" 
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3.3 The morphology of the Class A and B infixed pronouns 


This section focuses on some pronominal infixes given in the previous section 
which involve morphological processes other than the mere addition of a seg- 
ment. Specifically, I refer to the third person singular masculine / neuter forms of 
Class A and to the combination of some lexical preverbs with the Class B forms. 

As for the third person singular masculine / neuter pronouns in CV- lexical 
preverbs such as (to- >) do- and di/e-/do- (see Table 2), it seems that the synchroni- 
cally most adequate description of the morphological process concerned is that 
they are the outcome of a process of replacement or substitution of the final vowel 
of the preverb by the vowel -a- plus the corresponding mutation. That is to say, the 
lexical preverbs (to- >) do-, di/e-/do-, ro-, fo- and the conjunct particle no- take out 
their vowel in order to include the vowel -a- which characterises those two per- 
sons: e.g. do- + -a-N’" > d(o)-a-N", i.e. da-™™. This analysis is perfectly compatible 
with the diachronic process of vowel elision, to be considered in section 5 below. 
In this light, the Class A third person singular masculine / neuter infixes seem to 
be a good example of replacive morphology (cf. Spencer 1998: 140-141), and are 
relevant in the context of this paper for the following two reasons. 

Firstly, the outcome of that process of morphological replacement may be the 
reason for the formal assimilation of the CV- lexical preverbs (to- >) do- and di/e-/ 
do- ‘from’ in pretonic position, a position in which their respective vowels should 
have been preserved as distinct. To be sure, the forms de and di can still be found 
in some verbs such as e.g. de-meccim ‘I despise’ (Sg. 39°1), but the extremely fre- 
quent compound do-gni ‘does, makes’ quoted above in (1), which is formed with 
the preverb de- ‘from’, has virtually only do- in pretonic position. In other words, I 
contend that one of the factors which led to the lack of distinction of the pretonic 
version of di/e- and (to- >) do- (both appearing in Old Irish as do- in most cases) 
was the substitution of the characteristic vowel of the preverb when it was com- 
bined with the third person singular m./n. infixed pronoun -a-"'*, where both the 
preverbs (to- >) do- and di/e- appeared in Old Irish as da-™™. Other factors which 
have surely played a role in that process are the coincidence in the consonant due 
to the change (to- >) do- in pretonic position (see GOI 8 178.2) and, eventually, the 
loss of semantic identity of the element involved due to the lexicalisation of the 
meaning of the compound. All those conditions meet in the verb do-gni just quoted. 
The same reason has been adduced for the confusion of lexical preverbs with the 
shape (-)VC-, as in e.g. the form assidroillet quoted in example (2b) above.? 


3 Not every case of loss of distinctiveness between lexical preverbs is left without response in 
Old Irish. The phenomenon known as ‘split for’ is in the end an attempt to maintain the 
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Second, this idea of replacement as a morphological process operating in 
the combination of Old Irish lexical preverbs with infixed pronouns can be ap- 
plied perfectly to Class B of infixed pronouns. Adopting the same synchronic 
perspective as for the Class A third person singular masculine / neuter forms 
above, the phonotactic combinations which can be considered for the (-)VC- lexi- 
cal preverbs are the following: (a) the combination of the Class B of infixed pro- 
nouns with the preverbs which end with a nasal (con-, in-) implies that the final 
nasal is substituted by the assumed /d/ of the infixed pronoun (i.e. con- + /d-/ > 
cot- /kod/); the same replacive process seems to apply to the combination of 
Class B infixes with preverbs in -s, i.e. (ess-, uss- >) as- and fris-, which give at- and 
frit- respectively; (b) when combined with preverbs ending with /r/ (for-, etar-), 
then the Class B pronoun is simply added to the preverb form and its initial dental 
sound is spelt either as -d- or as -t-; see the attested forms in Table 3 of section 4 
below; (c) for the lexical preverbs in a dental fricative (e.g. ad- /að/), the process at 
stake seems to be that the assumed /d/ of the Class B infixed pronoun again takes 
the place of the final consonant of the preverb or, alternatively, that the consonant 
of the preverb and that of the infix have ‘merged’ into a fortis consonant. 

The diachronic discussion on Class B infixed pronouns is a matter of sec- 
tions 5 and 6 below, but it is worth noting that the interpretation of the form of 
Class B as containing a /d/ sound is not the only one possible. In particular, the 
form at- corresponding to ad- /a6/ may well be the outcome of a merger of the 
final lenis of the preverb and an initial lenis sound, as if it were ad- /að/ + /6/ > 
at- /ad/. A parallel process may be the case of nepuid ‘not-being’ /‘nebud’/ de- 
rived from *neß'Buð considered in GOI (§ 137). 


3.4 The use of Class A instead of Class C with 1st and 2nd 
person infixed pronouns 


In compound verbs with a lexical preverb of whichever phonotactic structure in 
the pretonic position and which include an infixed pronominal reference, the 
formal distinction between the infixed form marking declarative clause type 
(Classes A/B) and the infixed form marking relative clause type (Class C) is reg- 
ularly made with third person pronouns. This opposition has been already illus- 
trated in examples (1) and (2) above. 


difference between two semantically opposed lexical preverbs, for- ‘over’ and fo- ‘under’, in 
some specific morphological combinations in which they could be confused. See Garcia-Castillero 
(2017) for this question. 
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However, compound verbs taking a first or second person infix are less sys- 
tematic in this regard, so that the pronominal infixes of Classes A and B some- 
times appear in forms in which relative clause type morphology is expected. In 
fact, this alternation between declarative and relative clause type morphology is 
a widespread phenomenon in Old Irish, not restricted to infixed pronouns; the 
reader may consult Ó hUiginn's (1986, 1998) studies quoted below in this section. 

One may therefore distinguish three groups of clauses according to whether 
declarative or relative clause type morphology (not only infixed pronouns) is 
associated with them. Group I consists of main declarative clauses and some 
specific subordinate clauses which are characterised by the regular use of de- 
clarative morphology. Group II consists of subordinate clauses which display 
both declarative and relative verbal forms. Group III consists of subordinate 
clauses which regularly show relative morphology. The verbal complex which 
has the relative conjunct particle -(s)a™-, as well as other conjunct particles 
such as the polar interrogative one already mentioned, are not mentioned in 
any of these groups because such conjunct particles do not appear in the at- 
tested forms included in Table 3 below. 

(I) Verbs with declarative clause type morphology are regularly used in: 

(a) main declarative clauses, 

(b) cleft sentences with an anteposed oblique constituent (i.e. a preposi- 

tional phrase), 

(c) adverbial subordinates introduced by co” ‘so that’, ma” ‘if, cia" 

‘though’.* 
(II) Verbs with either (nasalising) relative or declarative clause type morphol- 
ogy are used in: 

(d) in complement (or noun) subordinate clauses, 

(e) in adverbial subordinate clauses introduced by iarsindí ‘after’, lase 

‘when’ (mostly with a relative verb), amal ‘as’, (h)óre ‘because’. 
(III) Verbs with relative clause type morphology are regularly used in: 
(f) restrictive relative clauses of the leniting or nasalising type, 


4 In the language of the Glosses, a meaningless Class C third person singular neuter infixed 
pronoun -d'- appears regularly in the verbal complexes in indicative mood after the subordi- 
nating conjunctions ma" ‘if’ and cia’ ‘though’, provided that there is no other semantically full 
infixed pronoun. In line with the description in Garcia-Castillero (2020, Chapter 5), this use of 
-d'- is to be interpreted as the introduction of a marker of syntactic dependency. If these condi- 
tions are not met, these two subordinating conjunctions ma” ‘if and cia’ ‘though’ are regularly 
followed by a declarative clause type verbal complex. I hope to deal with the subordinating con- 
junction co" ‘so that’ and its relationship with the almost synonymous conjunct particle co"- in 
a future study. 
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(g) cleft-sentences with an anteposed subject or object NP (including cases of 
figura etymologica, which involve nasalising relative marking), 

(h) relative clauses after the light heads a and inti aní,” and in subordinate 
clauses after the temporal conjunctions a" ‘when’ and inta(i)n ‘when’, 

(i) relative clauses of types (f) after the stressed interrogative pronouns 
cia cid ‘who, what’. 


The observed variation between declarative and relative morphology in Group 
II is determined by various factors: see Ó hUiginn (1998: 126-130) for the varia- 
tion in complement clauses, and Garcia-Castillero (2020, ch. 5) for the variation 
in the third person singular of the copula after amal and (h)ore). Among these 
factors, person plays a prominent role. By person, I refer to the cases in which 
the involved verbal complex includes either only one pronominal reference ex- 
pressed by means of an inflectional ending or two pronominal references, one 
of them an infixed pronoun. The general tendency is that non-third persons, 
whether in the inflectional ending or in the infixed pronoun (or in both), favour 
the use of declarative clause type morphology. 

As an illustration consider (3), taken from O hUiginn (1986: 43, 45). 
Example (3a) shows relative morphology by means of the pretonic particle 
no- followed by nasalisation (graphically not marked in nocretim /no’gred- 
jim/), while pridchim in (3b), which lacks the pretonic particle, represents a 
declarative clause type form. 


(3 a. hóre nocretim isu 
because PV-N“believeisc.pres Jesusacc 
*because I believe in Jesus' (Wb. 1?2) 
b. hore pridchim soscele do gentib 
because preachasc aes gospel,«. to Gentilep,; py, 
*because I preach (the) gospel to the Gentiles' (Wb. 5*6) 


The variation between Classes A/B and C for infixed pronouns of whichever per- 
son in the syntactic structures mentioned in Group II is to be accounted for as a 
part of the same variation which is observed in verbal complexes without in- 
fixed pronoun. The pair of glosses in (4) shows the variation between declara- 
tive and relative third person singular neuter forms in a verbal complex after 


5 According to the general definition provided in the works quoted in García-Castillero (2018: 
48-49), a “light head" is a demonstrative pronoun which is (more or less exclusively) used as 
the head of a relative clause. 
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(h)ore ‘because’. In (4a), we have a Class C infix in the perfect form of as-beir 
‘says’ (to be analyzed as as-ind-rubartatar). In (4b), the perfect form fritracatar, 
from the verb fris-accai ‘hopes for’, shows the Class B infixed form which char- 
acterises it as a declarative clause type form. This sort of variation is more fre- 
quent with non-third persons, a nice example being the gloss in (5), which has 
a verb with a Class C form (i.e. no-n-dob-molor-sa, from the simple molaithir 
‘praises’) coordinated with another verbal complex including a Class A infixed 
form (i.e. no-m-móidim, from the simple moidid ‘boasts’), both depending on 
the previous hore, again the syntactic structure of type (e) above. 


(4) a. [...] huare asinrubartatar tris pueri 
because | PV-"*835Gyeur(C)*SAVquc.3pr.prer three childrenyo, p; 
‘[. . .] because tres pueri had said it’ (MI. 131212) 
b. [...] huare fritracatar som a deo 
because PV-3SGygur(B)-hOpeaue.3pL.prer=35Gneur from Godas 
“|. . .] because they have hoped for it a deo’ (MI. 131°10) 


(5) hore nondobmolorsa et nom móidim indib 
because PV-N4S2p1(c)-praisejsc.prrs=1SG and PV-1sG(A)-boastis¢.pres insp 
‘because I praise you and boast myself in you’ (Wb. 14°18) 


The important aspect at this moment is that non-third person infixed pronouns of 
Class A (i.e. infixed pronouns which value as declarative clause type markers) 
appear even in subordinate clauses included in Group III of the above classifica- 
tion, that is to say, in syntactic contexts in which relative morphology is consis- 
tently used. This can be observed in (6) and (7) with verbal complexes which 
have the conjunct particle no- and a CV- lexical preverb respectively. Both cases 
of (6) involve the same syntactic structure, i.e. a cleft sentence with anteposed 
subject, i.e. type (g) of Group III: example (6a) uses the expected Class C form 
-don- in nodonnertani, from the simple nertaid, but example (6b) shows Class A 
-m- in the simple beoigidir. The morphosyntactic structure in (7) corresponds to 
type (h) of Group III, i.e. the verbal complex introduced by the conjunction a" 
*when', and (7a) shows the expected Class C form -dat- in afundatferai, from 
fo-fera, whereas the form in (7b) has the Class A form -m- in andumsennat, from 
do-seinn. 


(6) a. is hé nodonnertani 
be3s¢.pres he PV-IPL(C)-strengthenssc pass 1PL 
‘It is He that strengthens us.’ (Wb. రఘు 
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b. is iress crist nombeoigedar 
Þbessc.pres faithyoy Christ;g; PV-1sG(A)-quickensss pres 
‘It is Christ’s faith that quickens me.’ (Wb. 19°20) 


(7 a. afundatferai 
when-PV-"^52sc(c)-present»sc pres 
‘when you (sg) present yourself (sg)’ (MI. ౩826-27) 
b. andumsennat 
when-"^?PV-1sc(C)-pursues,, pres 
‘when they pursue me’ (MI. 39°28) 


Even though it is far from being usual, because the 'expected' use of Class C is 
also frequently encountered in those types included in Groups II and III, the 
use of Class A instead of Class C is a well-established fact.” The extent to which 
non-third person infixed pronouns of Class B are used instead of the expected 
Class C form, that is to say, cases that run parallel to those in (6b) and (7b), is 
discussed in the next section. 


4 The descriptive problem: The opposition 
between Class B and Class C 


In contrast to what can be observed for the cases in which the forms of Class A 
are involved, the use of non-third person infixed pronouns of Class B instead of 
the Class C counterparts seems to be the rule. The list of forms included in 
Table 3 below is based on the collection provided by Sommer (1897) and has 
been revised with the aid of Kavanagh (2001), Griffith and Stifter (2013) and 
Bauer (2015). 

Table 3 is to be interpreted in the following way: the dictionary headword 
of the Old Irish verb is given in the leftmost column, the attested form and rele- 
vant syntactic structure with its English translation appear in the central col- 
umn, and the following three columns to the right, headed by the signs (I), (II) 
and (III), correspond to the three main groups of syntactic structures consid- 
ered in the previous section. In each of the rightmost columns, the following 


6 Of the 77 verbal complexes with a non-third person infix attested in Wb., MI. and Sg. with a 
pretonic CV- lexical preverb or with a conjunct particle appearing in a syntactic context of 
Group III, i.e. cases such as those in (6) and (7), 33 cases bear Class A, i.e. 43%, e.g. (6b) and 
(7b), and 44 cases bear Class C forms, i.e. 57%, e.g. (6a) and (7a). 
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Table 3: Old Irish 1st and 2nd person infixes of Classes B and C in their syntactic context. 


Lemma form Attested form I I III 
as-beir ‘says’ amal asndonberat ‘as they say of us’ (Wb. 2712) C (e) 
(plebs dei) asndanberthe ni ‘It is plebs Dei that C (g) 
we used to be called.' (Ml. 114?7) 
as-rochoili atamrochoilse ‘Determine me!’ (impv.) B (a) 
‘determines’ (ML. 24°15) 
as-scarta ‘drives ma atamscartisse ‘if they were to drive me’ B (c) 
away' (ML. 59°21) 
con-airléici ‘permits’ condammairleicea ‘that He should let me’ C (d) 
(MI. 38?11) 
iarsindi cotanrairlic *after He let us go' B (e) 
(MI. 125?9) 
con-boing cochotabosadsi ‘so that he should crush you’ B (c) 
‘smashes’ (MI. 187) 
con-delca ‘compares’ frinn fanisin cotondelcfam ‘with ourselves we B (b) 
will compare ourselves' (Wb. 17°10) 
con-éicnigethar ithéside cotammeignigthersa ‘It is these by B (b)? B (g)? 
‘compels’ which I am compelled.’ (Ml. 21°10) 
con-erchloi ‘leads’ cotomerchloither ‘| am led.’ (Sg. 177), gl. agor B (a) 
con-nerta cototnertsu ‘Strengthen thyself!’ (impv.) B (a) 
‘strengthens’ (Wb. 30?9) 
con-ocaib ‘lifts up, an condammucbaitisse ‘when they used to C (h) 
raises’ beatify me’ (MI. 39211) 
cotabucabarsi ‘Be lifted up!’ (impv.) (Ml. 4677) B (a) 
con-oscaigi ‘moves’ cotammoscaigse ‘| should move [in the B (a) 
mountains].’ (Ml. 2993) 
condatoscaigther ‘that you might be moved’ C (d) 
(ML. 23421), gl. commouere 
cotatoscaigthersu ‘Be moved, O God!’ (impv.) B (a) 
(Ml. 58714) 
con-rig ‘binds’ cotobárrig ‘[he] has constrained you’ (Wb. 9°19) B(a) 
cotanrirastarni ‘We will be bound.’ (Ml. 13471) B (a) 
con-secha ‘corrects’  cotob sechfider ‘Ye will be corrected.’ (Wb. 9°23) B (a) 


Table 3 (continued) 


Lemma form 


con-utuinc ‘builds’ 
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Attested form I Il 


cotofutaincsi (MS cotofutaircsi) ‘He upbuilds B (a) 
you.’ (Wb. 8°16) 


in-árban ‘impels’ atatairbined su ‘Let it impel You.’ (impv.) B (a) 
(ML. 86°10) 
in-greinn atamgrennat ‘They persecute me.’ (Ml. 39713) B (a) 
*persecutes' 
donaib hi atamgrennat ‘to those who persecute B (f) 
me' (Ml. 127*8) 
honaib hi atangrennat ‘by those who persecute B (f) 
us’ (Ml. 45°16) 
ind-saig (/ad-saig) frisna preceptori atobsegatsi ‘like the preachers B (f) 
‘approaches’ who go to you’ (Wb. 14°37) 
in-snaid ‘inserts’ coatomsnassar ‘that | may be engrafted’ B (c) 
(Wb. 5°30), gl. ut ego inserer 
in-sorchaigedar coatabsorchaigther (MS coatabsorchaither) B (c) 
‘illuminates’ ‘that you may be illuminated’ (Ml. 53°15) 
in-togair ‘invokes’ indattogarsa ‘that | invoke you’ (MI. 724, C (d) 
gl. inuocandi te 
ad-aig ‘drives’ massuthol atomaig ‘if it is desire what drives B (g) 
me’ (Wb. 10°26) 
dilmaine aisndisen atannaigni ‘Licence of B (a)? B (g)? 
narration impels us.’ (Ml. 93412) 
isfoirbthetu hirisse attotaig ‘It is perfection of B (g) 
faith that impels thee.’ (MI. 93712) 
cid atobaich ‘What impels you?’ (Wb. 9°20) B (i) 
cid atobaig dó ‘What impels you to it? B (i) 
(Wb. 19°10a) 
ad-anaig ‘brings’ atomanaste ‘that | should be brought’ B (d) 
(Wb. 14*20), gl. a uobis deduci 
ad-cí ‘sees’ atatchigestar ‘You will be seen.’ (Ml. 59°12) B (a) 
atobciside ‘He perceives you,’ (Wb. 25°26) B (a) 
ad:cumaing cindas persine attotchomnicc ‘What sort of B (f) 
‘happens’ person art thou’ (lit. ‘what sort of person is it 


that has befallen you?’) (Wb. 6°13) 
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Table 3 (continued) 


Lemma form Attested form I I III 
ad-ella ‘visits’ atdubelliub ‘| will visit you.’ (Wb. 774) B (a) 
ad-eirrig ‘emends’ atanneirrig ‘who emends us’ (MI. 114710), B (f) 
gl. qui nos [. . .] emendat? 
ad-gaib atabgabed ‘Let it reprehend you (pl).’ (impv.) B (a) 
‘reprehends’ (ML. 20°11) 
ad-gair ‘sues, adobragart’ ‘He sued you.’ (Wb. 19°5 [prima B (a) 
forbids, fascinates’ manus]), gl. uos fascinavit 
ad-gladathar lase atat gladainn se ‘when | used to address B (e) 
‘addresses’ you’ (MI. 62°16.) 
ad-gnin ‘recognizes’ atatgentarsu ‘You will be known.’ (Ml. 121222) B (a) 
ad-indnaig ‘leads’ atdomindnastar in ispaniam ‘I shall be brought B (a)? B (d)? 
in Hispaniam.’ (Wb. 7°5) 
ad-opair ‘sacrifices’ atamroipred ‘| was offered.’ (Ml. 44°17) B (a) 
at-reig ‘rises’ anatammresa ‘when | will rise’ (Ml. 31°14) B (h) 
fris-oirg ‘injures’ fritumchomartsa ‘| have been offended.’ B (a) 
(Wb. 33°12) 
cia erat fritammior sa ‘How long will it afflict B (f)? 
me?’ (MI. 32427) 
is ed aerat fritammiurat ‘(It is] so long [that] will B (f) 
they afflict me.’ (Ml. 33?1) 
fritammorcat ‘who injure me’ (MI. 39°27) B (f) 
cum? fritammoircise ‘when you injure me’ B (e) 
(Ml. 44°26) 
frisnahi fritammorcat sa ‘against those that B (f) 


afflict me’ (Ml. 62°21) 


7 Sommer (1897: 190) is probably right when he explains this form as due to a mistake of the 
glossator (“Wohl Versehen des Schreibers fiir atob- [probably a mistake of the scribe for atob-]”). 

8 Stokes and Strachan (1901-1910 = Thes. 1: 126, n. m) note that this Latin conjunction stands 
for Old Irish intan ‘when’ or lase ‘while’. 
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Table 3 (continued) 


Lemma form Attested form I Il III 


ciofut fritatniarrsu ‘How long will he offend B (f) 
You?’ (MI. 93°15) 


fris-tét ‘answers’ fritumthiagar ‘| am answered.’ (Sg. 183?3), B (a) 

gl. obeor 
for-brissi ‘breaks sechnicoimnactar arnamait són fortanbristis ni B (d) 
down' ‘That is, our enemies have not been able to 


crush us.’ (Ml. 13524) 


for-cain ‘teaches’ isdo fordoncain ‘It is for this it teaches us.’ B?(b) 
(Wb. 31°16) 


fortanroichanni ‘You have instructed us’ (Ml. 229) B (a) 


it [hé] fortan roichechnatarni ‘It is they that B (g) 
taught us.’ (Ml. 63°1) 
aforcital forndobcanar ‘the teaching by which ye C (f) 
are taught’ (Wb. 3°23) 
fortab cech ansa “| will teach you (pl.).’ (ML. B (a)? 
53°14) 
fordubcechna ‘who shall teach you’ (Wb. 9°16), C?(f) 
gl. qui uos commonefaciat 
for-díuclainn fortamdiucuilset sa ‘that they may devour me’ B (d)? 
*devours' (Ml. 44°32), gl. uorare me 
for-comai fordomchomaither ‘| am preserved.’ (Sg. 139°2) B?(a) 
‘preserves’ 
for-moinethar fordobmoinetar ‘They envy you.’ (Wb. 19427) B?(a) 
‘envies’ 
for-tét ‘helps’ cofardumthésidse ‘so that you may help me’ B?(c) 
(Wb. 7°12) 
fortat tet su ‘It helps you.’ (Ml. 43°11) B (a) 
etar-diben co etardamdibet sa ‘in order that they might B?(c) 
*destroys' destroy me’ (Ml. 44°31) 
co etardamdibitisse ‘in order that they might B?(c) 


destroy me’ (Ml. 54°14) 


etar-scara lasse etardanroscarni ‘when he has separated C? (e) 
*separates' us’ (Ml. 120?3) 
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information is encoded: the capital letters B and C refer to the infix class used, 
and the small letter between parentheses indicates the syntactic structure in 
which it is used. Bearing in mind the possibility of having a declarative instead 
of a relative verbal form observed in the previous section, the information en- 
coded in those three columns must be read as follows. The presence of B in col- 
umn (I) is the expected procedure in the syntactic structures concerned. The 
presence of B or C in column (II) can be considered as a part of the general phe- 
nomenon of variation between declarative and relative clause type forms in 
those syntactic environments. Finally, in column (IID), i.e. in the syntactic envi- 
ronments in which relative morphology (in this case, Class C of pronominal 
infix) is expected, the presence of a Class B form should be considered as paral- 
lel to the use of Class A in those situations, as illustrated in examples (6b) and 
(7b) of the previous section. The question mark after the capital letter indicates 
that the infix Class (either B or C) is not clear, something which is not rare at 
all. After the letter between parentheses, the question mark indicates that the 
syntactic structure involved is not clear. Note also that imperative forms (e.g. 
the form of the verb as-rochoili) are marked as (a), i.e. they are counted along 
with the declarative forms. 

The descriptive problem posed by the forms included in Table 3 is the con- 
siderably high amount of Class B non-third person infixes in verbal forms in 
which relative morphology is expected. To be more precise, in the syntactic 
structures of Group III, this is the case of 19 cases out of a total of 23 forms; this 
80% of unexpected Class B contrasts with the 43% of unexpected Class A ob- 
served at the end of the previous section. 

The evidence provided by the Old Irish glosses, as it is presented in Table 3, 
permits us to make finer distinctions, in this case, according to the lexical pre- 
verb involved. The verbs in Table 3 are ordered according to whether they distin- 
guish between Classes B and C, such that those lexical compounds which 
distinguish (more or less frequently) between the two classes precede those 
which apparently do not. Thus, a fairly frequent verb with the lexical preverb as- 
such as as-beir ‘says’ uses on two occassions the Class C forms of the non-third 
persons infixed pronouns, the reason being probably the nasalising character of 
the relative forms involved. In the case of the preverb con-, one out of three cases 
of forms in which relative morphology (i.e. Class C) would be expected shows the 
form used as Class B. The verbal forms with the preverb in(d)- show only one 
case of Class C infixed pronoun out of four possible forms. The preverbs ad- and 
fris- only display Class B forms, regardless of the expected clause type morphol- 
ogy. Finally, the preverbs for- and etar- also show a considerable degree of 
confusion between the spellings with t and d. On the one hand, the forms 
spelt with -t- can be identified as Class B infixes, but some of them (e.g. fortan 
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roichechnatarni ‘who have taught us’) appear in forms in which relative mor- 
phology would be expected, whereas the forms with -d-, which therefore seem 
to be Class C forms, appear in syntactic environments in which declarative 
morphology is undoubtedly expected (i.e. co etardamdibet sa ‘in order that 
they might destroy me’). As usually acknowledged, the spellings -t- and -d- 
seem to interchange freely in the cases in which these two preverbs in pre- 
tonic position are combined with a pronominal infix. 

The conclusion seems plausible that non-third persons actually are on the 
verge of making no distinction between Classes B and C, that is to say, that a 
good deal of the compounds with preverbs of the type (-)VC- only use one set of 
non-third person infixed pronouns, regardless of the expected declarative or 
relative morphology of the verb. The descriptive problem may therefore be for- 
mulated in terms of paradigm defectiveness: is the opposition between Classes 
B and C an actually effective opposition in the non-third persons, so that the 
rare cases in which an apparently Class C form may be identified are actually a 
sort of incipient attempt to establish that distinction? 

In this descriptive problem it is not convenient to take for granted the exis- 
tence of a specific differentiation in a particular NP type (in this case, first 
and second person pronouns) by the mere fact that that differentiation is car- 
ried through in other types of NPs (in this case, third person pronouns). The 
former constitutes a natural class which may show specific inflectional features 
not observable in the latter. Witness the various cases of lack of formal expres- 
sion observable in the first and second pronominal elements of not a few an- 
cient Indo-European languages for a grammatical opposition which is formally 
marked in the remaining NPs, as detailed in Garcia-Castillero (2001). In itself, 
the assumption of such an asymmetric situation between non-third and third 
person pronominal markers would not be something objectable. 

The position defended in this paper is that such an asymmetric paradigm 
in the infixed pronouns attached to (-)VC- lexical preverbs (with the exception 
of imm- and ar-) must be taken seriously, so that the differentiation between 
relative and declarative forms is more or less systematic for the third persons, 
but not for the non-third persons. Table 4 below can be viewed as a comple- 
ment of Table 2 above as a means of representing more realistically the situa- 
tion of the (-)VC- lexical preverbs other than con- and in-, i.e. the situation of 
for-, etar-, fris-, and ad-, which do not distinguish systematically between B 
and C Classes of non-third person infixed pronouns. 
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Table 4: The lexical preverbs to- (representing CV-) and for- 
(representing (-)VC- preverbs) and the use of Classes A, B and C 
of infixed pronouns in the language of the Old Irish glosses. 


Class A Class B Class C Class C 

to- for- to- 
158. do-m-* for-t/dam-' do-dam-' 
258. do-t- for-t/dat-" 60-626 
3sg.masc. d(o)-a-" for-t-^ for-id-" do-d-" 
3sg.neut. d(o)-a- for-t-' for-id-' do-d-' 
3sg.fem. do-s- for-t/da- do-da- 
ipl. do-n- for-t/dan- do-dan- 
2pl. do-b- for-t/dab- do-dab- 
3pl. do-s-" for-t/da-" do-da-" 


5 The diachronic problem: Paradigmatic split 
or merger? 


Up to this point, the diachronic perspective has been adopted only on a couple 
of occasions to consider the morphological processes leading to the formation of 
some Class A infixed pronouns, side by side with a more descriptive account 
of the same phenomenon. This section introduces the systematic consideration 
of the origin of the Class B forms, for which a purely descriptive stance has hith- 
erto been adopted. 

Whereas Class B clearly represents a problem in this regard, there is great 
consensus (if not complete agreement) among scholars about the etymology of 
the Old Irish Classes A and C of infixed pronouns. 

To begin with Class C, the -d- of this set of forms is etymologically the same 
element as the -d- of the negative relative conjunct particle nad- mentioned in sec- 
tion 2 above. Especially in the case of the first and second persons, the Class C 
forms can be analysed straightforwardly as the combination of that -d- with the cor- 
responding infix of Class A: e.g. Class C first person singular -dom- equals *-d(V)- + 
Class A first person singular -m- and so on. For the ultimate (i.e. Proto-Indo- 
European) origin of this Old Irish -d- marker associated to relative clause type 
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marking, namely, the connective clitic *de, I refer to the observations and referen- 
ces in Watkins (1963: 24) and McCone (2006: 273-276). 

As for the Class A forms, I focus on the third person singular masculine / 
neuter infix(es), which will play an important role in the next section. It is usu- 
ally assumed that they represent the clitic accusative masculine and neuter 
forms of the Proto-Indo-European anaphoric stem *e/i-, namely, the forms *em 
and *ed respectively. For these Proto-Irish and Proto-Insular Celtic forms, see 
Schrijver (1997b: 54-56). Assuming that the dental plosive of the neuter form 
was lost in absolute final position, the resulting forms *em and (*ed ») *e ex- 
plain the nasalising and leniting effects of the corresponding infixes in Old 
Irish, which have been noted as -a-N’". As for the combination with the lexical 
preverbs with the shape CV-, the process assumed in section 3.3 which involves 
the elimination of the first of two vowels standing in hiatus seems to be the 
best diachronic explanation for these forms. This diachronic origin is perfectly 
compatible with the synchronic interpretation in terms of replacive morphol- 
ogy, as also stated in section 3.3. 

It is also worth noting in this section that the exceptional character of the Old 
Irish lexical preverbs imm- and ar- is due to the fact that they come from forms 
which originally ended in a vowel (see GOI 8411). This original vocalic auslaut 
agrees with or directly explains (a) the use of Class A infixed pronouns with the 
lexical preverbs imm- and ar- as observed in Table 2 above, (b) the relative clause 
type forms with these lexical preverbs (i.e. imme/a- and are/a-), as well as (c) the 
palatal character of the preverb ar- in the tonic position of the prototonic forms.? 

The preverb noted in Table 1 as *ath(i), which appears as aith- ‘re-’ in its 
stressed form, and as ad- in the pretonic position, takes Class B infixes and 
must be assumed, at least for the form used in pretonic position, as a preverb 
without the final vowel, in spite of its most probable etymology, which has a 
final vowel. In order to explain the different behaviour of this *ath(i) with re- 
spect to ar- and imm-, Uhlich (2009-2010: 154) adduces that the former lacks a 
prepositional counterpart as the reason for the maintenance of the form with- 
out vowel *ath- > Old Irish ad- ‘re-’. The lexical preverbs imm- and ar- had a 


9 Consider e.g. the imperative airbir (biuth) ‘consume!’ (Wb. 29?25), from ar-beir, where the 
stress falls on the first vowel of the preverb; similarly, nádairchissa ‘that he spare not’ (Wb. 
5°35), a present subjunctive from the verb ar-cessi. In forms like these, the original shape of 
the preverb may have been both *-are- and *-ari-. I leave this question of the original auslaut 
of Old Irish ar- open. 
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prepositional counterpart, in which Uhlich assumes that the final vowel would 
have been maintained and therefrom extended to the preverb.'? 

A clear consequence of this diachronic observation on imm- and ar- is that 
the phonotactic structure of the lexical preverb involved must be taken as the 
definitive factor deciding the Class (whether A or B) of the infixed pronoun 
used to express declarative clause type. In order to include ar- and imm-, the 
shape hitherto considered as CV- should be reformulated as *(V)CV-, as (*-)CV- 
or, more simply, as (-)CV-. This is an important argument to be considered in 
the diachronic discussion in this section, but is not the explanation itself, since 
it still does not state the reason for the use of Class B infixes after lexical pre- 
verbs of the shape (-)VC-. 

As for Class B infixed pronouns, and in line with the descriptive question 
formulated in the previous section, the diachronic problem can be formulated 
as follows: is the virtual lack of distinction between Classes B and C in the non- 
third persons a remnant of an original situation in which there was actually no 
such distinction, or is it the outcome of a process in which two originally differ- 
ent paradigms (Classes B and C) are not distinguished anymore, or only scarcely, 
in the non-third persons? In other words, one must decide between a process of 
paradigmatic split or a process of paradigmatic merger, respectively. 

The diachronic explanation to be developed in the next section assumes 
that there was originally a single paradigm of forms and that a process of mor- 
phological split has given rise to two forms for some elements of the paradigm, 
in this case, in the third persons, and tentatively in the other persons and for 
some lexical preverbs. 

The opposite view has also been defended, most conspicuously perhaps by 
Thurneysen (GOI 8 455), who relies on the form adopted by the (-)VC- preverbs 
con- and in(d)- in the expression of Class B pronominal infixes. Certainly, cot- 
and at- can be the phonologically regular outcome of previous sequences such as 
*kon-t- and *in-t- respectively. On the basis of these forms, Thurneysen assumes 
that Class B is initially the form of the Proto-Indo-European demonstrative stem 
*so-/to-, which was initially used to express some third persons, and was later 
generalised for the remaining persons. The main problem this assumption faces 
is that it completely lacks a motivation for the use of two different infixes, say, 
the acc.sg. masc. *em (from the PIE stem *e/i-) as the forerunner of the third 


10 Though it is certainly difficult to demonstrate, it may well be the case that aith- / ad- ‘re-’, 
due to the fact that it is less frequent than ar- and imm- and also due to its formal similarity 
with ad- (in line with the argument in section 3.3 on the loss of formal distinctivity of previ- 
ously different preverbs, especially in pretonic position), has been secondarily attracted to the 
group of the (-)VC- lexical preverbs. 
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person singular masculine of Class A, and the acc.sg. masc. *tom, as the forerun- 
ner of the corresponding form of Class B. Since the distribution of Classes A and 
B is clearly associated to the phonotactic shape of the lexical preverb involved in 
the pretonic position, as just stated, one should ask why a lexical preverb such 
as, say, *di/e- took *-em-, whereas forms such as *eks- > *ess- would have taken 
*-tom- to express the same person in exactly the same syntactic context. 

For other proposals for the origin of Class B which are located in a wider 
discussion but which defend two originally different paradigms, see McCone 
(2006: 229—231). 


6 The present hypothesis: The paradigmatic split 
into B and C 


This section develops the hypothesis of a paradigmatic split according to which 
a unique paradigm, the one which is Class C in Old Irish, split into two different 
paradigms, Classes B and C. This explanation has three points: (i) the trigger of 
the split, dealt with in section 6.1; (ii) the specific syntactic context(s) in which 
the use of Class C in declarative clause type forms was enabled, in section 6.2; 
and (iii) the very process of paradigmatic split, in section 6.3. 


6.1 Watkins’ (1962) ‘Forward Reconstruction’ and the trigger 
of the split 


The starting point of this diachronic explanation is the consideration of the par- 
adigmatic structure in which the [+ third person singular masculine / neuter 
pronominal infixes] feature cross-cuts the declarative vs relative clause type op- 
position, in line with the second issue considered in section 2 above. 

The resulting schema of formal oppositions is illustrated in Table 5 with the 
lexical compound do-beir ‘brings, gives’, the lexical preverb of which has the 
CV- shape. On the one hand, the declarative and relative forms without infixed 
pronoun contrast by the lack or presence respectively of relative mutation (na- 
salisation or lenition) in the first consonant of the tonic part; this is an example 
of one of the formal strategies in which this clause type opposition is marked. 
On the other, the contrast between the latter form, i.e. the relative clause type 
form without pronominal infix, and the declarative form with such an infix is 
expressed by means of the different vowel of the pretonic preverb, which are 
-o- and -a- respectively; this difference has been discussed in section 3.3 above. 
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Table 5: [+ 3rd person singular masculine / neuter pronominal infixes] and the difference 
declarative vs relative clause type in a lexical compound with a (-)CV- preverb. 


Declarative clause type Relative clause type 


[- 3sg. masc./neut. pronominal infixes] — do-beir do-N‘beir 
‘(s)he brings’ ‘who(m) / that ([s]he) brings’ 
[+ 3sg. masc./neut. pronominal infixes]  da-"/‘beir dod." pejr!! 


‘(s)he brings him / it ‘who / that brings him / it 


The four Old Irish verbal complexes turn out to be formally distinct, subtle and 
minimal as the difference may be. 

The basic diachronic assumption of this proposal is that the Class B of in- 
fixed pronouns is the response to a situation in which (some of) the (-)VC- lexical 
preverbs were not able to make an important distinction, the one between the 
nasalising and leniting relative forms, on the one hand, and the declarative form 
including the third person singular masculine / neuter infixed pronouns as ini- 
tially expected according to the same origin assumable for those infixes in combi- 
nation with CV- lexical preverbs (i.e. *em and *e[d]), on the other. The reason for 
this formal coincidence was that the latter regularly lost its palatal character" in 
pretonic position, the only remnant of their presence being nasalisation and leni- 
tion. This initial situation is reflected in Table 6, in which the declarative clause 
type form combined with the third person singular masculine / neuter infixed pro- 
noun (i.e. the form *ad-“/“ci ‘(s)he sees him / it’) is ‘forward-reconstructed’ in the 
sense of Watkins (1962: 2-3) and Eska (2003), i.e. is reconstructed as an expected 
Old Irish form which, however, is not attested. In the same table, the assumed 
relative form including the corresponding infixed pronoun has the expected out- 
come for a Class C form, i.e. *at-N/“ci /ad-/ ‘who sees him / it’ < *ad-de-e(m)-. This 
form, which is in Old Irish declarative due to its Class B infixed form, is marked 
with an asterisk because it is given as a relative form. 

As for as-, the form with Class A third person singular masculine / neuter 
pronominal infix would have been *es™^- (cf. the Old Irish conjugated preposi- 
tion es(s), mostly ass ‘out of it’), depalatalised as *es\’"- and then *as"/-., in 


11 Note that the mutations marked as “ in the forms of this table do not always have the same 
function: whereas in do-’/“beir they are the relative mutations, in da-"'beir and dod. V" beir 
they are the mutations provoked by the third person singular masculine and neuter infixes 
respectively. 

12 This palatal character is the effect of the so-called second or third palatalizations, which 
assume palatalization by the effect of a vanishing front vowel (McCone 1996: 117,119). 
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Table 6: The expected homonymy of declarative form with 3rd person singular masculine / 
neuter pronominal infixes and relative without such pronominal infix in a lexical compound 
with preverb (-)VC-. 


Declarative clause type Relative clause type 
[- 3sg. masc./neut. pronominal ad-ci ad Gj 
infixes] ‘(s)he sees’ ‘who(m) / that ([s]he) 
sees’ 
[+ 3sg. masc./neut. pronominal (*ad-e(m)- >) *aà-"/- > Old Irish *gt. Gf 
infixes] *ad Gf ‘who / that sees 
‘(s)he sees him / it him / it’ 


much the same way as most cases of the conjugated preposition just quoted. 
Other (-)VC lexical preverbs would have evolved as (*wor-e[m]- >) *foir *-- (by 
depalatalisation of unstressed forms » Old Irish *for" -), and (*kon-e[m]- >) 
*coin""-- (by depalatalisation) > Old Irish *con"".-,? In other words, I am as- 
suming here that the palatal character caused by the (Class A) third person sin- 
gular masculine / neuter infixed pronoun in (-)VC- lexical preverbs would have 
been lost in pretonic position and that this brought about the complete homon- 
ymy of this form and the same lexical preverb followed by relative nasalisation 
and lenition. 

As for the depalatalisation itself, current treatments as e.g. McCone (1996: 135) 
and Stifter (2009: 62) assume an Early Old Irish process affecting consonants in 
unstressed words such as the copula, prepositions, particles, etc., and this inevita- 
bly implies a relatively recent chronology for the rise of Class B infixed pronouns, 
not very long before the classical Old Irish period. 

The Early Old Irish texts in which some such functional words still show pal- 
atal character are the Cambray Homily and the prima manus of the Wb. Glosses, 
both dated approximately between the end of the seventh and the beginning of 
the eighth centuries. There is a potential problem in that the Cambray Homily in- 
cludes a case of Class B pronominal infix, precisely the third person singular 
neuter form (autrubert ‘has said it’ [Thes. 2: 246.14], from as-beir). In this 
sense, one could argue that, if depalatalisation was still a process not accom- 
plished in these Early Old Irish texts, then the creation of the Class B infixed 
form in the manner just assumed could hardly have happened. However, the 


13 The specific situation of the lexical preverb ete/ir- is too complicated to be considered in 
this paper. 
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forms which still show palatalisation are of a different nature to the lexical pre- 
verbs of verbal complexes, as is perhaps suggested by Thurneysen (GOI 8168). 
They are independent conjunctions which still show their original palatal charac- 
ter: amail ‘as’ (Thes. 2: 245.14) for later amal, oire ‘because’ (Thes. 2: 246.5-6) for 
later (h)óre, (h)uare (as in e.g. (3), (4) and (5) above). In the case of these and 
other conjunctions, the depalatalisation is probably a process to be assumed for 
later phases, as seems to be the case of intain *when' (Thes. 2: 247.3), which in 
Wb. appears mostly as intain and in MI. and Sg. mostly as intan; or even air ‘for’ 
(Thes. 2: 245.33), which later appears both as air, as in example (2a), and as ar. 
These differences are probably due to the different chronologies of the gramma- 
ticalisation processes leading to their character of conjunction, which is probably 
accompanied by a difference in stress. With respect to the independent conjunc- 
tion air ‘for’, the corresponding stressed conjugated form aire ‘for him, for it’ ap- 
pears in this manner both in the Cambray Homily (cf. Thes. 2: 244.33) and in 
classical Old Irish.” 

None of the unstressed forms quoted in the previous paragraph, however, 
are constitutive components of the Old Irish verbal complex, and the process of 
depalatalisation assumed at this moment is the one affecting lexical preverbs 
located in the pretonic position of this morphological structure. In this sense, 
the Cambray Homily already has the form ma arfoimam (MS maar foim am) ‘if 
we receive’ (Thes. 2: 245.12) from ar-foim, with the lexical preverb ar- in pretonic 
position also found in later Old Irish texts. As stated in the previous section, the 
declarative form of the preverb ar- with third person singular masculine / neuter 
(i.e. 276/2) and the conjugated preposition aire ‘for him, for it, therefore’ are 
best explained from a previous form *ari-, whereas the conjunction air ‘for’ and 
the bare pretonic form of the lexical preverb (as ar- in the previous form of the 
Cambray Homily) could be derived from both *ari- and *are-. The expected pala- 
talisation in the pretonic form of the lexical preverb in its declarative clause type 
form without infixed pronoun has suffered the process of depalatalisation of un- 
stressed forms now under consideration. 


14 To conclude the treatment of the so-called Early Old Irish attestation, the combination of 
the preposition ar- with the oblique relative conjunct particle -(s)a"- in aire sechethar ‘that he 
follow’ (Thes. 2: 244.31) seems to maintain the palatal character of the pretonic sequence, but 
the same sequence shows no palatal character a couple of lines before in the same text: ara 
tinóla ‘that he gather’, are n-airema ‘that he receive’ (Thes. 2: 244.27—28), from do-inóla and 
ar-eim respectively. This conjunct particle, which is the outcome of a relatively recent process 
of internalisation of a previously autonomous sequence of preposition and demonstrative "sa" 
(see the observations in Garcia-Castillero 2018), may well have preserved some features of its 
previous situation. 
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The important aspect for the question of the depalatalisation in preverb 
forms with Class A third person singular masculine / neuter infixed pronoun 
such as (*ad-e[m]- >) *ad’-N’"- > Old Irish *ad‘/"- and (*ess-e[m]- >) *es"- > Old 
Irish “asY/t-, as well as *foir/"- > Old Irish *for’’"- is that its most adequate 
parallel is the form of the bare preverb ar- just considered, i.e. the form of the 
preverb used in the declarative clause type form without infixed pronoun, and 
this form already shows the depalatalised form in Early Old Irish, i.e. *ari/e- > 
*ar- > ar- (in ar-foimam). The depalatalisation of unstressed elements can 
therefore be considered a relatively long process which takes place before and 
after the Early Old Irish period, but the crucial change assumed above in 
Table 6 for the lexical preverbs with the shape (-)VC- seems to have happened 
well before that time. 

It is to be said that the homonymy between forms which can be included in 
the previous table is not always avoided. The two following cases can be men- 
tioned. First, most lexical compounds with the deuterotonic shapes CV-VC(-) and 
(-)VC-VC(-) make no systematic distinction between declarative and leniting rela- 
tive clause type forms without infixed pronoun, so that e.g. do-adbat may be ‘(s) 
he shows’ and ‘who shows’ or ‘whom/which (s)he shows’; ad-aig, ‘(s)he drives’ 
and ‘who drives’. Second, verbs with the lexical preverb ar- have the same form 
for the relative clause type form and the declarative with third person singular 
masculine / neuter infixed pronoun (e.g. ara! Ngaib both ‘who seizes’ and ‘(s)he 
seizes it / him’). Note that the second case is the one which is being considered 
for the lexical preverbs taking Class B infixed pronouns. 

In other cases, however, a more marked verbal complex is used to express the 
meaning of another form in which a phonological process has caused a certain dis- 
turbance in the distinction of the categories included in Tables 5 and 6 above. I 
refer to the case of the verb fo-fera ‘produces, causes’ included in Table 7. In this 
verb, the regular form of the relative without pronominal infix is fo-era,” but this 
combination is also expressed by means of the relative form with pronominal 
infix: the form fodera must be interpreted simply as ‘which causes’ in Wb. 3°33, 
Wb. 3°34, Wb. 545, Wb. 14°42, MI. 93713, MI. 55“11, and as ‘which causes it’ in Wb. 
33°12, MI. 32°5. The same use of the Class C infix deprived of any referentiality can 
be assumed for dodesta ‘what is lacking’, the relative form from the CV-VC(-) verb 
do-esta ‘is lacking’ which is used after the light heads a" and ani ‘that (what). 


15 This form can be deduced from the past subjunctive fuerad ‘that [Joshua] provided’ (Wb. 
33°13). The lack of -f- in the leniting relative form fo-era is the regular outcome of the lenition 
of /f/. 
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Table 7: [+ 3rd person singular neuter pronominal infix] and the difference declarative vs. 
relative clause type in the lexical compound fo-fera ‘causes’. 


Declarative clause type Leniting relative clause type 
[- 3sg. neut. pronominal infix] fo-fera fo-era | fo-d-era 

‘(s)he causes’ ‘who causes’ 
[+ 35g. neut. pronominal infix] fa-era fo-d-era ^ 

‘(s)he causes it’ ‘who / that causes it’ 


Turning to the situation of homonymy resulting in Table 6 above, the remedy 
was to make use of the most marked form, i.e. the form originally used to express 
these infixed pronouns in a relative clause type verb, in order to express also the 
pronoun of the declarative clause type verb. This first step in the process leading 
to the creation of Class B infixed pronouns is illustrated in Table 8. 


Table 8: [+ 3rd person singular masculine / neuter pronominal infixes] and the difference 
declarative vs. relative clause type in a lexical compound with preverb (-)VC-. 


Declarative clause type Relative clause type 


[- 3sg. masc./neut. pronominal infixes] ad-cí ad í 

‘(s)he sees’ ‘who(m) / that ([s]he) sees’ 
[+ 3sg. masc./neut. pronominal infixes]  *ed-""ef at." cj € (*at-"/ cf) 

‘(s)he sees him / it’ *who / that sees him / it 


In spite of the different use of the relative form including the object pronominal 
reference, both processes assumed in Tables 7 and 8 share two remarkable fea- 
tures. First, in the initial situation, one of the expected forms turns out to be 
problematic, due either to its inherent structure (the hiatus resulting in the 
form fo-era), or to its lack of differentiation from another form (ad-N'*cí as *who- 
[m] [(s)he] sees’ and ‘(s)he sees him / it’). Second, the clearer (in the sense of 
formally more perceptible) form with infixed pronoun of Class C comes to the 
rescue in both cases: on the one hand, the relative form with neuter infixed pro- 
noun (i.e. with Class C form) fo-d-era ‘who causes it’ lacks the phonotactically 
uncomfortable hiatus of the attested form fo-era and is used to express (also) 
the bare relative form *who causes'; on the other, the assumed relative form 
with pronominal infix at-".cí ‘who sees him / it’ has the advantage that it 
shows clearly the form of a pronominal infix and is used to express the declara- 
tive form with pronominal infix '(s)he sees him / it'. 
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In both cases, of course, it is necessary to find a syntactic environment in 
which the use of the surrogate form can be justified. 


6.2 The bridging context 


In the previous section, the relative form with pronominal infix of both fo-fera and 
compounds with a lexical preverb ending in consonant such as ad:cí has been as- 
sumed to be the surrogate form for other forms which became problematic from 
the point of view of their morphological distinctiveness. The portmanteau mor- 
pheme which is the Class C infixed pronoun, expressing both person/number and 
relative clause type character, is used to express only one of these two categories. 

The specific syntactic environment and pronominal forms in which those 
two changes may have taken place constitute the ‘bridging context’ defined by 
Heine (2002: 86) as “a specific context giving rise to an inference in favor of a 
new meaning”. As has been made clear in the previous section, the differen- 
ces and similarities between the cases of the verb fo-fera and of the verbs with a 
lexical preverb ending in a consonant must be clearly stated. 

As for the use of fodera as the relative form of fo-fera, the most plausible 
scenario is provided by the cataphoric use of the third person singular neuter 
infix, a context in which the meaning of the pronominal marker can be lost. 
This cataphoric use of the Old Irish pronominal elements attached to the verbal 
complex has been studied by Lucht (1994) and, more recently, by Eska (2010). 
An example among many others is rafoiligestar (i.e. r[o]-a(?-foiligestar) in (8), 
in which the third person singular neuter infix -a?- cataphorically, i.e. prolep- 
tically (see GOI § 421), refers to the object NP introduced by the light head a" 
(anadfiadar is indsalm so) appearing later in the same sentence. 


(8) rafoiligestar nathan duduid 
AUG-3SG,;"revealsss pret Nathanyoy to=Davidpar 
anadfiadar is indsalm 50 
న. thepar.sc.masc=PSalMpar PROX 


‘Nathan revealed to David what is related in this psalm.’ 
(lit. ‘Nathan revealed it to David, that which is related in this psalm’) 
(MI. 109^2) 


16 The “bridging context", which Heine assumed to be a step in the process of grammaticali- 
sation, should therefore also be considered in the process of paradigmatic split as the environ- 
ment in which a functional change is possible. 
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As for the use of a Class C pronominal infix, i.e. of an infix which is initially 
expected in a relative clause type form, to express the same pronominal refer- 
ence in a situation in which a declarative clause type form could be expected, 
the most adequate bridging context is in the syntactic structures included 
above in Group II, since complement clauses and adverbial subordinates intro- 
duced by conjunctions such as amal ‘as’ or (h)óre ‘because’ show in the Old 
Irish glosses a certain degree of variation between declarative and relative 
clause type. The possibility of variation in those syntactic contexts basically in- 
volves the use of Class A instead of Class C, and this would have been inter- 
preted as a neutralisation of that difference. 

An example of this situation is given in (9), where the third person plural 
infixed pronoun of Class A -s- observed in Table 2 above (i.e. a declarative 
clause type marker) is used in the verb after the subordinating conjunction hore 
*because', which is often accompanied by a nasalising relative verb. For this 
use, see Ó hUiginn (1986). 


(9) hóre nosmóidet iprecept 
because PV-3PL(A)-boast3py,.prrs in=preaching,,; 
‘because they boast in preaching, [. . .]' (Wb. 175) 


The form fritracatar quoted in (4b) above and repeated here in (10), from the 
verb fris-accai, is especially significant at this point. This example was adduced 
in section 3.4 as a case in which declarative morphology is used in a syntactic 
context in which relative morphology can also appear. In the light of the pres- 
ent diachronic hypothesis, the Class B third person singular neuter infixed pro- 
noun in fritracatar may well be a case in which the original Class C form added 
to the original form of the preverb involved (i.e. *wri0- + /-0.-/ > /frid'-/, spelt 
as frit-) was liable to an interpretation as a declarative clause type marker, i.e. 
in a place in which the declarative clause type Class A infix can also appear. 


(10) [...]huare fritracatar som a deo 
... because PV-3SGyeu7(B)-hopeaue.3pt.prer=35Gneur from Godas 
‘[. . .] because they have hoped for it a deo’ (MI. 131710) 


Most cases with first and second person infixed pronoun included in the col- 
umn (II) of Table 3 perfectly represent the syntactic contexts in which this use 
of the third person singular masculine / neuter infixed pronouns of Class C 
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would be reinterpreted as markers of declarative clause type due to their clearer 
form. In most cases of the column (III) of Table 3, the interpretation of the Class 
B infixed form as coming from a Class C form makes perfect sense: e.g. the form 
anatammresa ‘when I will rise’ included in Table 3, from the verb at-reig, ap- 
pears in a structure in which relative morphology is otherwise compulsory, and 
this agrees with its diachronic interpretation as an originally Class C form. 

Up to this point, the explanations for the relative form of fo-fera and for the 
rise of Class B infixed pronouns have run in quite a parallel fashion. They are 
different, however, in that in the first case, only one single pronominal element 
is involved, the third person singular neuter infixed pronoun, and one single 
lexical element. In the case of Class B of infixed pronouns, the whole pronomi- 
nal paradigm comes into play and, in addition to that, there are quite a number 
of compound verbs involved, i.e. those which had a (-)VC- lexical preverb in the 
pretonic position. 


6.3 The mechanism of paradigmatic split 


It is time to delineate the morphological process by means of which two para- 
digms, i.e. Classes B and C of pronominal infixes, arise from a single one, i.e. 
Class C. For this purpose, the well-known case of paradigmatic split in which 
two Latin nouns, deus, dei ‘god’ and diuus, diui ‘deity, divine’, have developed 
out of a single original paradigm, the one included in step (i) of Figure 1, may 
serve to illustrate the basic mechanism of this morphological change. The 
changes that occurred from step (i) to step (iib) of Figure 1 are of a phonological 
nature and trigger the later process of paradigmatic split. First, step (iia) shows 
the effect of the regular changes /-wo-/ » /-o-/ and /-ej-/ » /-e-/; forms with and 
without /-w-/ are thus created in the paradigm. In step (iib), the change from 
closed /-e-/ to /-i-/, which happens in forms such as the genitive and ablative 
singular but is prevented in other forms such as the nominative and accusative 
singular due to the shortening of that /-e-/ in prevocalic position, causes a still 
clearer differentiation between those two groups of forms. Properly, the split is 
in step (iii), where each original group of forms analogically creates the missing 
parts of their paradigms. 

The change from (iib) to (iii) in Figure 1 is graphically shown as the substi- 
tution of the horizontal by a vertical line, and this is an appropiate representa- 
tion of the creation of two different paradigms from a situation in which two 
variants within one and the same paradigm were created. 

I therefore assume two main steps for the process of paradigmatic split, 
first the introduction of some sort of variation within a given paradigm (ii) in 
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(i) > | (iia) > | (iib) > |(iii) 
nom.sg. |*dejwos > *deos > *deos > deus diuus 
acc.sg. |*dejwom >*déom >*deom > deum v4 diuum 
gen.sg. |*dejwi > *dewr > *diwi dei > du? ^ 
ablsg. |*dejwod > *dēwōd > *dīwōd deo > diuo 


Figure 1: The process of paradigmatic split leading to Latin deus, dei ‘god’ and diuus, diui ‘deity’. 


Figure 1), and second the analogical creation of new forms corresponding to 
each of the original variants, thus giving rise to two different paradigms ((iii) in 
Figure 1), the existence of which must be justified on the basis of some func- 
tional or semantic difference, the basic requisite for any process of split in dia- 
chronic morphology (see García-Castillero 2013). 

Accordingly, the paradigmatic split assumed for the creation of Class B as a 
distinct paradigm from the original Class C is delineated in Figure 2, where the 
lexical preverbs aith- and uss- have been omitted. Note that the steps (ii) and 
(iii) include the forms in their usual Old Irish spelling. 


(i) (ii) (iiia) (iib) 
> >| with non-3rd persons |> with 3sg masc./neut. 

Class C Class C Class B Class C Class B Class C 
con- |*con-d- > con-d- cot-am- con-d-am- cot- con-d-! 
in-  *in-d- > in-d- at-am- in-d-am- at- in-d- 
ad- |*ad-d- » at- /ad-/ at-am- ^^ at- ad-id- 
ess- |*as-d- > at- at-am- at- as-id- 
fris- |*friO-d- > frit- frit-am- frit- fris-id- 
for- |*for-d- » fort/d- fort/d-am- fort/d- for-id- 
etar- |*etar-d- > etart/d- etart/d-am- etart/d- etar-id- 


Figure 2: Paradigmatic split from original Class C to Old lrish Classes B and C. 

"This lexical preverb con- combined in Old Irish with the Class C 3 person singular masculine / 
neuter infix often appears as conid-"'- (e.g. ML 106^8 lasse conidrerp *when he has entrusted 
himself’, from con-erbai). Though the consideration of this form conid- should also take into 
account the homonymous forms of the conjunct particle co"- ‘so that’ plus the same infix and of 
this conjunct particle combined with the copula, it seems that the analogical influence of the 
other forms with renewed Class C 3'* person singular masculine / neuter form -id- (i.e. ad-id- 
and so on) suffices to explain the form conid- in this combination of lexical preverb plus Class C 
infix. | owe this observation to Elliott Lash (p.c.). 


The forms given in step (i) of Figure 2 are those ones expected for Class C, 
which is characterised by the addition of the (already) relative marker *-dV- to 
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the bare form of the lexical preverb followed by the corresponding affixal pro- 
noun. At this stage, there is no need to differentiate between non-third and 
third persons, and the regular developments assumed for each form arrived at 
the situation in step (ii) of Figure 2, with more or less transparent forms such as 
con-d-, in-d- and for-t-, etar-t-, with other forms which are most easily explained 
as due to the fusion of the final dental with the initial dental of the infix (ad-, 
aith-, fris- in so far as this is from *fri0-), but also with forms in which the final 
consonant of the lexical preverb was apparently substituted by the form of the 
infixed pronoun beginning with -t- /d/ (as in the case of ess-, oss- and - taken 
at its face value - fris-). The substitution of the final consonant which can be 
assumed for *ess- > as- > a(s)-t- (and, mutatis mutandis, for oss- and friss-), 
would be of course a further case of replacive morphology, similar to that as- 
sumed in section 3.3 above for the third person singular masc./neut. forms of 
Class A with preverbs such as do-. 

On the basis of that situation, step (iiia) of Figure 2 represents the first 
move towards the differentiation between Classes B and C for (-)VC- lexical pre- 
verbs, and corresponds to McCone's explanation below of the Class B forms cot- 
and at-, from con- and in- respectively. 


It has, of course, long been realised that con-d(-) and in-d(-),which actually do occur as 

class C forms in relative clauses, would be the regular outcome of the sequences *kom-de- 

and *in(de)-de- in main clauses too. The simple solution is to posit analogical creation of 
main-clause co-t(-) and a-t(-) with loss of the preverb's final consonant as in most other 
cases such as a-t(-) « *ad-de- or *ey-de- « *ey(z)-de- in relation to ad- and as- (« *ess « 

*eys) respectively or fri-t- « *wrid-de- < *wrid(z)-de-) in relation to fris (« *writs). 

(McCone 2006: 229) 
Apart from being a step in the development of a new distinction between Class 
B and Class C, the situation of step (iiia) in Figure 2, which includes the first 
person singular pronominal infix, reflects quite faithfully the description in sec- 
tion 4 for non-third person infixes of Class B, at least for most of the corre- 
sponding lexical preverbs. With respect to that situation, step (iiib) of Figure 2 
represents the introduction of new forms for distinguishing Class C forms for 
the third persons, i.e. the creation of new relative clause type forms for the 
third person singular pronouns, in which the distinction between declarative 
and relative is in general more systematic. 

The whole process of paradigmatic split may therefore be viewed in the 
change from step (ii) to step (iiib) in Figure 2: the horizontal line in step (ii) is 
partially put in the vertical position in step (iiia), as a consequence of the analog- 
ical creation of co-t(-) and a-t(-), whereas the remaining horizontal part of that 
line ends up in the vertical position in step (iiib), as a consequence of the analog- 
ical creation of the forms for the third persons in the other lexical preverbs. 
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7 Conclusion 


The diachronic explanation for the Old Irish Class B of infixed pronouns argued 
for here is different to previous ones in some important respects. First, it pays 
special attention to the use of these forms in the contemporaneous Old Irish 
texts. Second, much in line with the basic tenets in Garcia-Castillero (2015), it 
considers the interaction between phonological, morphophonological, morpho- 
logical, and also syntactic aspects of the Old Irish verbal complex: in particular, 
it takes seriously the assumable phonotactic conditions of some grammatical 
distinctions such as clause type and pronominal references in lexical com- 
pounds with a (-)VC- lexical preverb, and how the morphologically undesirable 
consequences of some phonological changes can be avoided. 

The starting point of the diachronic explanation put forward in this chapter 
is a situation in which there were only Classes A and C. The forms of the third 
person singular masculine / neuter infixed pronouns of Class C were then used 
to express those persons when their Class A version had been obliterated by 
regular phonological changes with most lexical preverbs of the shape (-)VC-. 
The use of Class C instead of the vanished marking attributable to the original 
forms of Class A was facilitated in syntactic contexts in which both relative and 
declarative clause type morphology were possible, and the new ‘declarative’ 
forms were levelled through the whole paradigm, a process especially easy for 
the non-third persons, since the verbal complexes including those infixed pro- 
nouns seem to be less in need to distinguish between relative and declarative 
clause type forms, at least in view of the frequent use of Class A instead of ex- 
pected Class C. This step in the development of Class B corresponds to the situ- 
ation assumed for the non-third persons. In the next step, the third person 
infixes created a form different to the newly created Class B by adding the end- 
ing -id- to the bare form of the lexical preverb, thus renewing the form of Class 
C in those third persons. 

This diachronic explanation does not need to go far back in the prehistory 
of the Irish language in order to explain the origin of Class B infixed pronouns 
and, in fact, it nicely fits in with the following descriptive issues observed in 
the language of the Old Irish glosses. First, it seems clear that the relevant 
factor for the use of Class B of infixed pronouns is the phonotactic structure 
of the involved lexical preverb in pretonic position, namely the structure 
(-)VC-. Second, this diachronic explanation also agrees with the default char- 
acter of the infixes of Classes A and C, which are used not only with (-)CV- 
preverbs, but also with the remaining conjunct particles. Third, other situa- 
tions of homonymy which happen in the case of some specific verbs and 
under some specific circumstances (e.g. fo-fera ‘causes’) are sometimes 
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corrected by using the more visible form. Fourth, the assumption that the 
third person singular masculine / neuter infixed pronouns constitute the 
locus of the whole change accords well with the fact that these forms are the 
most frequent infixes in the contemporaneous Old Irish texts, as clearly ob- 
servable in Sommer's (1897) collection of forms. Fifth, it directly explains the 
remarkable asymmetric situation found in the language of the glosses in the 
use of the non-third persons of the Class B infixed pronouns, which turns out 
to be a specific step in the assumed process of paradigmatic split; this asym- 
metry agrees with general trends in the distribution of declarative and rela- 
tive clause type marking in Old Irish. 

As a general result of this study, the consideration of the process of para- 
digm split has also revealed the need of a ‘bridging context’ for this change, 
which is also essential in grammaticalisation processes. This is surely not a 
matter of chance. In fact, grammaticalisation is one of the possible sources of 
a new morpheme in a given paradigm, or of a new paradigm, and the conse- 
quence of the creation of these new morphological elements is that there is a 
morphological split so that the new morpheme or paradigm expresses a spe- 
cific function or meaning, different to the meanings of the morphemes al- 
ready existing in that paradigm or to the meanings of the already existing 
paradigm(s). 
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Elisa Roma 

7 Nasalisation after inflected nominals 
in the Old Irish glosses: Evidence 
for variation and change 


In this chapter, I discuss the variation in the occurrence of nasalisation on de- 
monstratives, adjectives, nouns and inflected prepositions or adverbs following 
inflected nouns and adjectives in the Old Irish Wiirzburg, Turin, St. Gall and 
Milan Glosses (henceforth Wb., Tur., Sg. and MI. respectively; see section 1.1 
below for the use of edited sources). In these contexts nasalisation is more ir- 
regular (at least in spelling) than after proclitics such as articles, possessives 
and prepositions, and apparently unpredictable. 

The data I presented at the Colloquium ‘Variation and Change in the Syntax 
and Morphology of Medieval Celtic Languages’ (Maynooth, 13 October 2017) were 
at the time not published yet and had not been presented before. Since in the 
meantime they have been published in Roma (2018a), this article will not discuss 
them in detail but only report examples and data for the sake of the argument 
and dwell on some of their diachronic and synchronic implications. 

The paper is organised as follows: section 1 illustrates the contexts taken 
into account and summarises the data, according to a broad classification of pho- 
netic and syntactic environments (sections 1.2 and 1.3 respectively). Section 2 
discusses the data from a distinctive diachronic vs. synchronic perspective. 
Section 3 sums up the content of the paper and its tentative conclusions. 


1 Nasalisation after inflected nominals in the Old 
Irish glosses: The data 


1.1 Sources 


The data presented in Roma (2018a), which form the basis of the analysis pro- 
vided in this chapter, were collected as follows. Instances from Ml. were ex- 
tracted from Griffith and Stifter (2013), checking all nominal and adjectival 
entries in the dictionary, while data from Wb., Tur. and Sg. were gathered from 
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Thes. and cross-checked with Kavanagh (2001), Bauer (2015)! and Lash (2018)? 
for Wb., Sg. and Tur. respectively. 


1.2 Phonetic environments 


The occurrences of nasalisation or lack thereof were grouped according to a 
broad classification of possible phonetic and syntactic environments. This was 
devised expanding on the classification and the results of Thurneysen's (1905) 
survey, which was the most comprehensive available. Before listing the pho- 
netic environments, a caveat is in order. Nasalisation is expected to be noted in 
Old Irish spelling only on voiced plosives and on vowels: these were the cases 
taken into account in the collection of data, in search for a measure of irregular- 
ity, which points to variation (see Ó Maolalaigh 2008: 242). Methodologically, 
therefore, nasalisation as a phonetic or even morphophonological feature cannot 
be the starting point, as with present-day varieties, but is the demonstrandum. 

The phonetic environments are listed below with illustrative examples. The 
relevant nasalisation marker is highlighted in bold when examples are quoted 
in the body of the text. 

The first kind of environment groups cases where nasalisation would occur 
after or on a vowel, as in (1). This phonetic environment, where Thurneysen 
(1905) found that nasalisation was more frequent, includes cases where the trig- 
gering word ends in a vowel, as in (1d) and (11), and cases where the nasalised 
word begins with a vowel, as in (1a)-(1c) and (8), as well as instances where 
nasalisation occurs between vowels, as in (1e) and (9). Note that instances after 
a final «n» belong to a separate category, see (2) below. The examples in (1a) to 
(1e) show nasalisation occurring on different word classes (or nominal case), 
i.e. on a noun after an agreeing adjective, on an adjective after a head noun, on 
a demonstrative after a head noun, on a noun in the genitive case, on an in- 
flected preposition (or adverbial), respectively. For all the other phonetic envi- 
ronments listed in this section only a single example will be given, but note 
that word class of the nasalised word has always been taken into account in 
the classification and is relevant to the frequency of nasalisation (see section 


1 Bauer (2015) was not available yet when the collection of data from Sg. began. Bauer, 
Hofman, and Moran's (2018) digital edition of the St. Gall Glosses has also been occasionally 
consulted. 

2 I warmly thank Elliott Lash for allowing me to consult the Minor Glosses Database before its 
publication in CorPH. 


7 Nasalisation after inflected nominals in the Old Irish glosses — 181 


1.3 below and Roma 2018a: 13). See section 1.3 for details about the classifica- 
tion according to the triggering word. 


(1) a. arnach nindocbáil móir 


for-anyacc.sc.rem “8 gloryacc greatacc.sc.rem 
‘for some great glory’ (Wb. 23°12) 


b. tiagait báas nanapaig 
8O3pr.PnEs deathacc “Sprematureacc.sc.neut 
‘They go to premature death.’ (Wb. 11°12) 

c. ambas nisin 


NAS 


theyom.sc.neur=deathyou DEICT_thatpar 


‘that death’ (Wb. 1542) 


d. ácenele ndoine 
NAS. 
theace.sc.neur=TAC€ acc MEN ¢eEn.pi 
‘the human race’ (Wb. 5°16) 
e. triguidi náirium 


through-beseeching,«. "for; 
‘through beseeching for me.’ (Wb. 7712) 


The second kind groups cases where nasalisation would occur after words end- 
ing with n before words beginning with a vowel, d or g, and words ending with 
«m» before words beginning with b: in these contexts the lack of a second nasal 
on the following word may not suggest lack of nasalisation at all (but see sec- 
tion 2 for a discussion of the treatment of this context in Sg.). An example is in 
(2), where nasalisation occurs on a genitive noun. 


(2  cotíchtin nancrist 
until=coming,,, ~“ Antichristeen 
‘until the coming of the Antichrist’ (Wb. 25%1) 


The third kind is defined as follows: nasalisation between dental consonants, 
i.e. all consonants which do not include a labial or velar or two plosives. When 
both consonants are plosives, the examples have been classified separately 
(see [5] below; in [3] final <d> represents a dental fricative). In (3) nasalisation 
between dental consonants occurs on a genitive noun. 


(3) rad ndé 
graCenom "God 
‘the grace of God’ (Wb. 7*3) 
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The fourth kind groups cases where nasalisation would occur between (non- 
dental) plosives, as in (4), where nasalisation occurs on an adjective. 


(4) 27724 coscrad indeseircc mbráthardi 
that-NEG-destroysc¢.pres.susy theacc.sc.rem=lOV€ acc “Sbrotherlyacc.sc.rem 
‘lest it should destroy the brotherly love’ (Wb. 10°1) 


The fifth environment groups instances where nasalisation is expected between 
dental plosives, as in (5), where nasalisation occurs on an inflected preposition. 


(5)  suidigfith dia recht ndo 
establishss; cur GOdnom laWacc NA tO3sc wasc 
‘God will establish a law for him.’ (MI. 46°20) 


Lastly, nasalisation may occur between all other consonants, as in (6), where 
nasalisation occurs on a genitive noun. 


(6) isdered mbetho inso 
iS3s¢.pres=CNdnom “Sworldcrn theyom.sc=thisnom 
‘This is the end of the world.’ (Wb. 10°3) 


The results of the survey are summarised below. I refer to Roma (2018a) for 
quantitative data. 

In Wb. nasalisation is regular only on agreeing nouns, adjectives and on de- 
monstratives, where it mostly occurs after/on a vowel (context as in [1] above). It 
tends not to be noted after nasals (context as in [2] above) on any following 
word. On genitive nouns it occurs frequently after/on vowels, and is attested in 
all interconsonantal environments except between dental plosives. On preposi- 
tions nasalisation is rarely found, and never in any interconsonantal position. 

In Sg. nasalisation after/on a vowel is always found on agreeing nouns, ad- 
jectives and on demonstratives; it is regular on genitive nouns in any phonetic 
context, while on inflected prepositions it is much more frequent than in Wb., 
but hardly attested between consonants and never between dental consonants 
(12 instances without). It tends to be regularly noted after final nasals, and a 
nasal also appears sometimes when the following word beginning with a vowel 
is spelt with an initial <h>, as in (7)? 


3 Three instances in Sg. (see Roma 2018a: 5). This cannot have any connection with the emer- 
gence of a voiceless glottal fricative after or between nasal vowels, in the position once occu- 
pied by a nasal consonant, noted by Ó Maolalaigh (2003: 117) for modern Gaelic dialects. See 
Schrijver (1997a: 219—220). 
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(7) ni fail chumscugud nhuirdd and 
NEG-bess;»&s Changeacc “Sordetcrn iN3s¢.neur.pat 
‘There is no change of order there.’ (Sg. 215?2) 


In Ml. nasalisation occurs in all phonetic environments. Although between den- 
tal consonants it tends to be avoided, between non-dental plosives and be- 
tween other consonants it is attested in a good number of instances on genitive 
nouns (9/19) and prepositions (10/19). Nevertheless, it is hardly spelt after final 
nasals on genitives and prepositions. 

In Tur. the examples are few, but nasalisation is regular and there is one 
instance of nasalisation between plosives (echtar comairbirt mbiuth pecthae 
‘outside the practice of the sins’ [Tur. 108]). 

The survey presented in Roma (2018a) and summarised here confirms 
Thurneysen’s observation (1905: 1) that nasalisation is never marked on an initial 
d- after a final -l (18 instances in Wb., 10 in Ml., 1 in Sg.), or between a final -m 
(fricative) and an initial g- (only occurring in 2 instances in Wb.). While for the 
latter context the evidence is too meagre to allow any conclusion, for the former 
it could be argued that a nasal consonant was avoided (and, possibly, not admit- 
ted in some varieties).^ Nevertheless, counterexamples apparently occur outside 
my corpus: iarthimcul rdí ‘after the circuit by it (the sun)’ (Thes. 2: 33.22 [Vienna 
Bede 23]; on an inflected preposition), where however nasalisation is unexpected 
after a dative singular noun; frisinnaraim n grecdi ‘to the Greek number’ (Thes. 2: 
34.28 [Vienna Bede 31]; on an adjective). Be that as it may, in other interconso- 
nantal contexts the syntactic environment seems crucial: for example, in Wb. 
there are 3 instances of nasalisation on adjectives out of 8 expected in intercon- 
sonantal position (-s mb-, -cc mb-, -cc ng-), but there are none on a preposition in 
any interconsonantal environment (0/97). 


1.3 Syntactic environments 


The syntactic environments where nasalisation occurs after inflected nominals 
have been grouped according to case, gender and number of the triggering word 
as well as according to word class of the nasalised word (see above section 1.2). 
They are listed below. 


4 The nasal in amal ùdondfoirde ‘as signifies it’ (Sg. 26°12), in a different syntactic context, is 
unusual, and Thes., followed by Bauer (2015), suggests correcting it to dondfoirride. 

5 These are the only two instances of nasalisation in these phonetic environments in the 
Minor Glosses Database (Lash 2018). Lenition after iar in Vienna Bede 23 is also irregular. 
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Nasalisation after a singular nominative neuter noun is exemplified in (1c), (3) 
and (6). Besides the five word classes listed in section 1.2 and exemplified in (1a) to 
(1e) respectively, it also occurs once on an agreeing noun in apposition, as in (8). 


(8) sliab nossa 
mountyoy "^0ssayoy 
‘Mount Ossa’ (Sg. 63716) 


Nasalisation after an accusative noun is exemplified in (1a), (1b) (1d), (1e) and 
(2). It may also occur on an agreeing noun in apposition, as in (9). 


(9) fridia nathir 
to-God,«. "““father,cc 
‘towards God the father’ (MI. 1278) 


Nasalisation after a genitive plural noun is exemplified in (10). 


(10) itseüit macc ngor 
è NAS: 
1S3p, pags 7 tea SUITES uou, pj, SONSGEN.PL D1OUSazw.pr.MAsC 
‘They are the treasures of pious sons.’ (Wb. 23°9) 


Nasalisation after a singular nominative neuter adjective is exemplified in (11). 


(11) isinse nduit 


: NAS. 
iS3sc.pres=hatdyom.sc.neur 10236 


‘It is impossible for you.’ (Wb. 5°28) 


Nasalisation after a noun phrase with an accusative or a nominative neuter 
noun+adjective is extremely rare: it only occurs once in Ml. 40°20, reported in 
(12), out of 11 similar syntactic environments. 


(12) ata debe mec nand 
PV-bessc.pres ifferenceyoy ""littleuou sc uxor nasc npur-par 
‘There is a little difference there.’ (M1. 40°20) 


The data clearly show that word class of the nasalised word is relevant for the 
occurrence of nasalisation. Nasalisation surfaces in the glosses according to the 
hierarchy in (13) below (see Roma 2018a: 13): 
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(13) nasalisation on agreeing words and demonstratives > nasalisation on 
nominal noun modifiers > nasalisation on any modifier following a noun 


I am aware that this picture is somehow biased by the following circumstances: 
- prenominal adjectives are mostly proclitic (cach ‘every’ and nach ‘any’ are 
the most frequent cases; there are only 3 examples with stressed cétnae 
‘first’? and 2 with uile ‘all’ in Ml.; 3 with cétna in Sg.; 1 with uile in Tur.); 
they can therefore be assumed to behave rather similarly to proclitic muta- 
tion-triggers such as articles, than as mutation-triggering inflected stressed 
words. 
— demonstratives which can show nasalisation all begin with a vowel (ucut) 
or a vowel initial deictic particle (isin, ísiu) 


Nevertheless, nasalisation is clearly shown more frequently on adjectives than 
on genitives and prepositions, as already noted by Thurneysen himself (1905; 
GOI 8 237). In Wb., nasalisation on prepositions is confined to the phonetic con- 
text after/on a vowel and does not occur after genitive plural nouns; genitive 
nouns are mostly not nasalised (overall 67 vs. 40), although nominative singu- 
lar neuter nouns mostly nasalise a following genitive (18 vs. 12). The same does 
not hold for Sg. and Ml. In Sg., nasalisation on prepositions is well attested 
whatever the trigger except neuter adjectives (context (11) above), and even pre- 
dominant after an accusative noun (12 vs. 9, overall on prepositions 19 vs. 33); 
nasalisation on genitives is largely predominant whatever the trigger (37 vs. 14). 
In Ml., nasalisation on prepositions is again frequent after most triggers ex- 
cept neuter adjectives (overall on prepositions 61 vs. 99, after neuter adjec- 
tives 7 vs. 16) and predominant on genitive nouns (84 vs. 50). 

Perhaps surprisingly, the strongest nasalisation triggers in Wb. are singular 
neuter nouns, in Ml. genitive plural nouns. In Sg. all nasalisation triggers ex- 
cept neuter adjectives nasalise between 7096 and 8096 of expected cases. 


2 Diachronic and synchronic variation 


According to Thurneysen (GOI 8 237), the frequent omission of the nasal in intercon- 
sonantal position is due to the regular dropping of a nasal in some consonant clus- 
ters. Thurneysen presumably drew his conclusions on his survey in Thurneysen 
(1905), which showed that the nasal was more frequently omitted in interconsonan- 
tal position. Nevertheless, this view implies that a nasal was lost between conso- 
nants through a highly irregular phonetic process (see below). 
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Thurneysen's phonological explanation for the lack of nasalisation does 
not address why the presence of a nasal between vowels apparently increases 
in time, or at any rate greatly differs from one corpus of glosses to another in 
certain syntactic environments: in the cases where nasalisation would occur on 
a vowel and/or follow a vowel (phonetic context as in 1), in Wb. 70 instances 
show nasalisation, 115 do not, in Sg. 68 show nasalisation and 24 do not, and 
in MI. 189 show nasalisation and 38 do not. Even if one excludes nasalisation 
on prepositions, which appears to be generally rarer, the corresponding figures 
are Wb. 59 with nasalisation vs. 29 without, Sg. 54 with vs. 8 without, Ml. 145 
with vs. 15 without. Of course, consonant dropping in interconsonantal position 
cannot be the reason for this kind of variation. 

It could be assumed that nasalisation was left out in spelling in Wb., al- 
though it was realised phonetically, or, vice versa, that Wb. reflects a variety 
where nasals were dropped more easily between consonants. The former hy- 
pothesis would be surprising for initial vowels, and neither hypothesis is sup- 
ported by independent evidence. Table 1 below reports the spelling of a few 
sample words with interconsonantal nasals in the three major corpora of 
glosses. 


Table 1: Spelling of n between consonants in the interior of words (sample lexemes). 


Lexeme Wb. Sg. and other MI. 
Priscian 
glosses 


with without with without with without 


n n n n n n 
frecndairc ‘present’, frechdarcus ‘presence’ 11 - 27° 1 20 - 
aisndís ‘declaration’ and related words 4 - 7 2 58 i 
(aisndisse, -i) 
forigaire “command” 2 - - 4 2 4 
tüailnge "ability" and related words 1 - - = 4 1 


(tüailngigidir, táailngigiud, tüailngigthe) 


6 Once without d. 
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The comparison between Table 1 and the data in section 1 shows that the fre- 
quencies with which the nasal is spelt between consonants in the interior of 
words and in nasalisation contexts do not match. The behaviour of clusters 
arising from syncope, such as those mentioned by Feuth (1982: 92) and O 
Maolalaigh (1995-1996: 164) and illustrated in Table 1, is different from the be- 
haviour of purported similar phonetic contexts between words. In Wb., while 
the nasal is regularly spelt in the interior of words, it is most frequently omitted 
between words, as shown in Roma (2018a) and noted above (sections 1.2 and 1.3). 
It may be added in this connection that in MI. there occur a few examples’ where 
nasalisation surfaces on a simple, proclitic preposition, thereby appearing in a 
phonetic environment which resembles more closely the unstressed position of 
the interior of words. But, again, the spelling of the nasal in the interior of words 
in Ml. does not seem to be more regular than in the other corpora of glosses. 

Indeed, different spelling conventions apparently hold for nasalisation on 
consonants and vowels after final nasals in the St. Gall Glosses (see 1.2 above), 
where nasalisation tends to be spelt regularly (72%), as opposed to both Wb. 
(5%) and Ml. (42%). This corresponds to Quin’s (1979: 256, 258) data on the 
spelling of nasalisation after the accusative article, since Sg. seems to mark na- 
salisation after accusative in more frequently. This spelling could also reflect a 
different phonetic realisation. Note that Sg. also features 6 instances of nasal- 
isation after Latin words ending in a nasal (the neuter nouns nomen, pronomen 
and cognomen, see Roma 2018a: 19).? 

Bronner (2016) has shown that in the Additamenta in the Book of Armagh 
the kind of nasalisation dealt with in this paper is almost regularly spelt differ- 
ently from nasalisation after proclitics such as articles, possessives, prepositions, 
conjunct particles and infixed pronouns: only this kind of nasal is written sepa- 
rately between mid-height dots. Note that nasalisation seems to be regularly 
spelt in the Additamenta, but there are only 5 instances on an inflected preposi- 
tion, of which 3 are of and ‘in it’; moreover, only 2 instances occur between con- 
sonants, 1 of which involves an adjective. This last is reported in (14)? 


7 Ml. 30°10, 46°1, 51?5, 72°9, 2355, 110910, 9613. See Roma (2018a: 12) for details. 

8 Among these, pronomen is quite clearly a borrowing (so in eDIL), given the dative plural 
form pronoibneib (Sg. 20056); cf. nasalisation after the accusative singular but no nasalisation 
after the dative singular in pronomen natárcadach vs. o pronomen atarcadach ‘an anaphoric 
pronoun' (Sg. 209^10). 

9 Nasalisation is not marked in contubart fland feble acheill dóo ‘Fland Feblae gave his 
church to him’ (Thes. 2: 242.20-21 [Book of Armagh, folio 18"?34-5]) (see above section 1.2 on 
the absence of nasalisation between -l and d-); possibly also in cach aleth ódib ‘each of them 
his way’ (Thes. 2: 240.20 [Book of Armagh, folio 1828). 
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(14) arech n: donn 
for=horseacc “““browNacc.sc.masc 
‘for a brown horse’ (Thes. 2: 240.1 [Book of Armagh, folio 17"?29]) 


This kind of spelling seems to be an alternative to the superscript dot or punc- 
tum delens — which never occurs in the Additamenta (Bronner 2016: 42) — and is 
therefore considered by Bronner (2016: 43, 45) a chronologically and/or geo- 
graphically restricted variant of such spelling, probably to be ascribed to an ex- 
perimental phase. The flanking dots of the Additamenta occur once in Ml. and 
once in Sg. (Ml. 69°23, Sg. 6^11), according to Bronner (2016: 44), but, perhaps 
significantly, not in Wb. 

Despite spelling variation, though, a purely graphic principle that could 
account for the observed variation between the corpora of glosses is highly 
unlikely. We must therefore conclude that either nasalisation was expanding 
or that Wb. and MI. reflect the distribution of nasalisation in two different va- 
rieties. The situation in Sg. is closer to Ml., but the regular spelling of nasal- 
isation after a final nasal seems to point to a somewhat different variety (see 
below about another peculiarity of Sg.). We turn therefore to possible diatopic 
and diachronic scenarios accounting for the observed variation in the occur- 
rence of nasalisation in the contexts examined here. Given the nature of the 
sources under consideration, the discussion is confined to the diatopic and 
diachronic dimensions, bearing in mind that register can in principle account 
for variation within every single corpus of glosses (McCone 1985: 102; Ahlqvist 
1988: 27). 

Ó Muircheartaigh (2015: 128), in giving a summary of recent scholarship 
on dialectal variation in Old Irish, suggests that the homogeneity of the Old 
Irish standard language of the glosses may prevent us from finding dialectal 
variation in the Early Irish period through comparison of the corpora of glosses. 
He claims that a detailed examination of the distribution of forms in the 
glosses may not lead to a full understanding of any dialectal differences and 
may prove to be fruitless, because the three corpora might all belong to the 
same larger dialect area, situated in North-Eastern Ireland between the mon- 
asteries of Armagh, Bangor and Iona. Ó Muircheartaigh (2015: 124—125) admits 
that some of the features outlined by GOI and reiterated by Ahlqvist (1988) as 
possible dialectal features in the glosses, such as for instance the superlative 
suffix (-em vs. -imem), the form of the demonstrative són vs. ón, the form of 
the reflexive céin vs. féin, may reflect dialectal variation. However, although 
the variation between the anaphoric pronouns ón and són may indeed be one of 
dialect, for example, these anaphoric pronouns have left no trace in the modern 
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languages, leaving their geographical implications unknown, as concluded by 
Ahlqvist (1988: 26). 

The major corpora of glosses may well belong to the same broad dialectal 
area. Still, even if later varieties cannot offer direct evidence to locate the out- 
comes of alternative forms in the Old Irish glosses, an attempt might be made 
to account for variation, if it surfaces, as I believe is the case for nasalisation. 
In principle, I expect that variation in this kind of mutation patterns in Old 
Irish could at first escape both morphological and spelling normalisation. 
Bronner’s (2016) findings concerning the spelling in the Book of Armagh con- 
firm this expectation. 

Moreover, despite the fact that the distribution of nasalisation after in- 
flected nominals in Old Irish cannot be plotted with respect to modern Gaelic 
dialects, as this kind of nasalisation has been lost throughout all varieties, its 
blocking vs. expansion may lie behind the diverging developments of nasalisa- 
tion in Scottish Gaelic and Irish. 

Drawing on the discussion in Ó Maolalaigh (1995-1996), in Roma (2018a) I 
suggested that lower frequency of nasalisation across phrasal boundaries and 
between consonants, as witnessed by Wb., could be linked to the eventual 
loss of nasalisation in many contexts, and therefore to the Scottish varieties.'? 
Nevertheless, Ó Maolalaigh's (1995-1996: 165) explanation for the loss of na- 
salisation following particles with consonantal codas presupposes the loss of 
nasalisation in interconsonantal position, a view that I am not inclined to ac- 
cept, for the reasons shown above. In fact, my previous assumption concern- 
ing lower frequency of nasalisation simply pushes back in time the common 
Gaelic variety that lies behind both Irish and Scottish, but, given the inconsis- 
tencies between loss of interconsonantal nasals and absence of nasalisation 
after inflected words, it does not suggest any pathway for the emergence of 
the new pattern. 

Therefore, it is better to assume that while Wb. must indeed reflect an older 
or at least a more conservative variety (of course in this respect, i.e. regarding 
the spread of nasalisation across phrases), Sg. and MI. might reflect two similar 
but possibly diatopically distinct later varieties. In either case, the variety in 
Wb. could indeed reflect a stage which preserved a sequence of nasal + voiced 


10 According to Ó Maolalaigh (2008: 232) some forms in the Notes in the Book of Deer may 
suggest that nasalisation of nominal modifiers following accusative singular nouns was not 
the norm in certain varieties of Gaelic in 12th-century eastern Scotland. However, other explan- 
ations are possible for acuitid thoisig ‘his toisech’s portion (?)' and incathraig ele ‘the other 
monastery' or 'into another monastery'. 
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plosive (i.e. where orthographic clusters were phonological clusters, as as- 
sumed for Old Irish in general by Quin 1979, Feuth 1982 and O Maolalaigh 
1995-1996, 2008), while the variety in MI. a stage and possibly a diatopic vari- 
ety where these were reduced to a single segment," as in Modern Irish (as as- 
sumed by Ahlqvist 1994 and McCone 1994). 

The regular spelling of nasalisation after nasals in Sg. (see section 1.2) 
could be linked to the reassignment of the nasal segment in Scottish Gaelic, 
where only proclitics with nasal codas nasalise. It might be relevant to note in 
this connection that Sg. even features a nasal after inflected 6en, which usually 
otherwise forms a compound with the following lexeme, in 2016, reported in 
(15) (see also Bronner 2016: 39, about genitive secht -n- delbich ‘septiformis’, in 
the Book of Armagh). 


(15) fornóin n deilb 
on=the ACC/DAT.SG.FEM TONE కరంటు 
‘according to one paradigm’ (Sg. 201°6) 


This phrase also contains what looks like an aphaeretic form of the article 
(namely, n or n) after a preposition presumably governing the accusative (but 
see below about a similar example where deilb would seem to be dative) 
(Strachan 1903b: 488). This form of the article is also attested in 222 and 4519, 
reported in (16) and (17) respectively. 


(16) etarndirainn 
between=theacc.pu.rem-tWOacc.rem=Paltacc.pu 
‘between the two parts’ (Sg. 2°2) 


(17) eterriddn? ulla 
between=the ace py.neurtWOacc.neur “Sulla 
‘between the two ulla’ (Sg. 45°19) 


11 In my data there appears to be one phonological spelling in ata debe mec nand ‘there is a 
little difference here’ (MI. 40°20 = [12] above), where mec for nasalised bec ‘little’, which 
would usually be spelt as <mbec>, points to eclipsis, i.e. single segment, for Ml. Given the vari- 
ation accounted for in this paper I would not extend to the other corpora this conclusion re- 
garding the phonology of nasalisation of voiced plosives. 

12 In the manuscript this nasalising n is followed by a flanking mid-height dot and is sepa- 
rated from the nasalised Latin word ulla, which follows on a new line (manuscript image in 
the virtual library Codici Electronici Sangallenses, consulted through Bauer, Hofman, and 
Moran [2018]). 
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In far nóerdeilb ‘according to the same paradigm’ (Sg. 902), the same phrase 
as in (15), deilb seems to be a dative form, since the phrase parallels far cétnu 
diull ‘according to the first declension’, where adjective (cétnu) and noun (diull) 
are both clearly in the dative case. Therefore, the aphaeretic form of the article 
in nóerideilb would be a dative form. GOI (8 467) in fact reports aphaeresis of 
monosyllabic forms of the article after r and preceding a numeral. Nevertheless, 
one example from Sg., (18) below, has eluded both the list in GOI and in 
Strachan ’s (1903b). 


(18) arbertar asnóentairmoirciunn? 
PV-expresSsp pres.pass from=thepar.s¢.neur=One.endingyar 
‘They are expressed by the same ending.’ (Sg. 33719a) 


Although Bauer (2015) classifies as+nasalisation here as a relative form of the 
copula followed by relative nasalisation, the phrase clearly parallels Latin ex 
eadem forma and is to be read as the preposition a(s)+aphaeretic form of the 
dative article." It is tempting to see in this form of the article, which looks like 
a nasal segment floating onto the noun and becoming a nasalisation nasal 
(both on vowels and on consonants,” albeit only after prepositions) in attested 
phases of the language, an incipient process which led to the generalisation of 
a nasalising article, as in Scottish Gaelic. This process would be the mirror 
image of the reanalysis proposed by Ó Maolalaigh (2016: 88-90) for the devel- 
opment of is ann as a topicalisation marker of non-nominal elements in 
Scottish Gaelic, i.e. the reinterpretation of the relative nasal segment following 
the copula as an independent morpheme (the inflected preposition ann). 

If the link proposed above between Sg. and Scottish Gaelic is not fallacious, 
the spread of nasalisation across phrasal boundaries witnessed by Ml. could, 
on the other hand, reflect an Irish variety without a direct extant offspring. This 
view relies on the observation that the drift towards the expansion of nasalisa- 
tion as a phrasal marker, even affecting proclitic prepositions, does not seem to 


13 The length mark seems rather on <o> than on <e> (despite oén- in Thes., Bauer [2015], 
Bauer, Hofman, and Moran [2018]). 

14 I doubt whether this may suggest an alternative origin for nasalisation after the preposition 
6s ‘above’ or rather adds to the analogical origin proposed by Ó Maolalaigh (2016), i.e. contam- 
ination with the relative form of the copula as. 

15 The form in tresngné ‘through the type’ (Sg. 73°1) is probably not "the masc. form of the 
article for the neuter gné” (Bauer 2015), despite Strachan’s (1903b: 488) statement that it is 
“undoubtedly for tresin ngné”, but simply the nasalising accusative singular neuter article, 
again without the initial vowel, i.e. for tresarigné (so eDIL s.v. tre). 
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have been continued any farther than the stage reflected in these glosses. The 
position of Tur. is very doubtful because the available data are scanty. 
Nevertheless, this corpus can be paired with Ml., since nasalisation appears 
regularly. 

The two alternative scenarios surmised in Roma (2018a) and suggested 
here can be sketched as in (a) and (b) in Figure 1 below, respectively, where the 
horizontal axis roughly reflects diachronic relationship (distance along the time 
axis) and the vertical axis roughly reflects diatopic variation (distance along 
the spatial axis). The tentative nature of the hypothesis that links variation 
within Old Irish with later dialects is represented by question marks. 


space 
Wb. ?Scottish 
a. Sg. 
MI. (Tur.) ?Trish time 
space 
Sg. ?Scottish 


?Trish time 


MI. (Tur.) 


Figure 1: Diachronic and possible diatopic variation in Old lrish. 


For the reasons outlined above, I maintain that the scenario in (b) is more 
likely. Nevertheless, I cannot draw any well-founded conclusion, except that 
the data on nasalisation after inflected nominals point to an early differentiat- 
ing feature between Gaelic dialects; this split may be either already reflected in 
the different pictures offered by Wb., Sg. and Ml., or in the later stage docu- 
mented by Sg. and Ml., which in some varieties, lost to us for lack of documen- 
tation, may have gone even further. 


3 Conclusions 


In this chapter I have tried to show that the absence of nasalisation after inflected 
nominals in Old Irish cannot be due in the first place to the loss of a nasal conso- 
nant in consonant clusters. Variation across the major corpora of Old Irish glosses 
is not trivial and must be due to diachronic change, i.e. the absence of nasalisation 
in some consonantal environments and across phrases is a conservation as op- 
posed to its later expansion, and possibly also to dialectal or, as a consequence, 
register variation, for which however it must be acknowledged that we do not have 
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sound independent evidence. Possible connections between the variation in the 
Old Irish glosses and the divergent developments of nasalisation in Irish and 
Scottish varieties respectively are in fact hard to determine at the current state of 
research but it is suggested here that the St. Gall Glosses might reflect a variety 
that lies behind the developments of Scottish dialects. Since it has been suggested 
on different grounds that nasalisation may have been one of the earliest differenti- 
ating features between Gaelic dialects (6 Maolalaigh 2008: 247), and that nasalisa- 
tion transfer is crucial for the Scottish developments (O Maolalaigh 1995-1996: 
165-167, 2008: 248), further inquiry along these lines might prove fruitful. 


Acknowledgement: I am indebted to the editors and to three anonymous re- 
viewers for comments and suggestions which greatly helped to improve an ear- 
lier version of this paper. 


Jürgen Uhlich 

8 On the obligatory use of a nasalising 
relative clause after an adjectival 
antecedent in the Old Irish glosses 


1 Introduction 


According to Thurneysen (GOI § 383), “an adverb formed from the dative of the 
adjective cannot be used in periphrasis with the copula before its clause ... 
[Instead,] the adverbial form is replaced by the nominative sg. neuter of the ad- 
jective . . ., and a nasalizing relative clause follows”. The same cleft-sentence 
construction is referred to in the second part of § 498,’ and the examples given? 
are: 


(1 277600 maith n-airlethar a muntir 
so.that-COP3¢c,pres.susy S0Odnomsc.neur "^ Cafe3sopaessus his household, ,, 
*that he care well for his household' (Wb. 28b32) 

(lit. ‘so that it may be good/a good thing how he cares . . .” [GOI 8383] as 
opposed to lit. *. . . well that. . .,' with an adverbial antecedent) 


(2) is lerithir inso no nguidim-se dia 
COP 3s¢.pres 26210650 the-this,«; PV-Sbeseechise.pres=1SG Godacc 
‘as zealously as this do I beseech God’ (Wb. 27°19; author's trans.) 


(3) is dinnimu do-ngni alaill 
COP 35¢.prrs 2620650008 PV-"dos,, 4s Otheracc 
‘It is more carelessly that he makes the other.’ (Wb. 4°33) 


Moreover, while “a nasalizing relative clause can be replaced by a formally in- 
dependent (i.e. principal) clause in almost every instance, ... this is not 


1 A more detailed description and evaluation of this construction is given by Mac Coisdealbha 
(1998: 155-157; cf. 257, n. 82). On some of his interpretations of specific cases see individually 
below. 

2 Here quoted from Thes., with added editorial macrons and, for these introductory examples, 
word division and hyphenation. For the length in in so and words of similar structure, see 
Breatnach (2003). 


3 Open Access. © 2020 Jürgen Uhlich, published by De Gruyter. [C9 EZTSSENI This work is licensed under 
the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. 
D d o/10 9783110680 003 
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possible . . . in the constructions described in 88 499,? 501,“ nor after a neuter 
adjective in periphrasis with the copula (§ 498)” (GOI 8 505). 

According to Thurneysen, therefore, the construction under consideration 
here presents one of the very few grammatical contexts in which a nasalising 
relative clause is predictable.’ If borne out by the detailed re-examination of 
the material undertaken here, this would allow a confident assessment - within 
this particular context — of how the formal characteristics of the nasalising rela- 
tive construction may be affected by adverse linguistic developments. To give 
two examples: (i) it would be possible to ascertain that in a putative case *is 
cosmail asbeir ‘similarly he says’, it is (at least statistically) more likely that the 
expected relative nasalisation in asmbeir has been suppressed in interconso- 
nantal position than that this should be counted among the symptoms for the 
nasalising relative construction as such already being in the process of loss (see 
also the discussion in Roma 20182, and Roma, this volume); (ii) being able to 
rely securely on the presence of a nasalising relative construction would also 
allow one to draw conclusions on the use of the different classes of infixed pro- 
nouns in relative sentences overall. For the only other construction that 


3 The specific figura etymologica construction of a verb connected to its own verbal noun as 
antecedent in an adverbial relative construction (of the pattern “the deliverance wherewith he 
delivered"), for which see further Stüber (2010-2012). 

4 With an object antecedent, where the nasalising relative alternates synchronically with a 
leniting relative; see Schrijver (1997b: 91-113). 

5 This ruling has been variously questioned by citing formally deviant examples; see 
Pedersen (1899: 391, 413, 414), Mac Coisdealbha (1998: 155), Ó hUiginn (1986: 58) and Isaac (in 
Mac Coisdealbha 1998: 257). Their objections and supporting examples will be assessed in sec- 
tions 5 and 6. 

6 The very initial symptoms of this linguistic innovation are described in GOI 8506. For the 
(partly sporadic) loss of interconsonantal nasals see GOI 8180 (2-3); Thurneysen (1905: 1-2), 
cf. Quin (1979-1980: 256). To the examples given there need to be added cases that show that 
even a grammatically functional nasal could be suppressed in this way (as is merely hinted at 
in GOI 8504 [c]: thus, while there are numerous instances where nasalisation is expressed 
between consonants, such as arnach n dermandadar dia ‘that God should not forget him’ (MI. 
3285), its loss is seen in connach[n]gabad huall de ‘that pride might not seize him’ (MI. 69°17); 
cf. further the parallel examples of indhuall rodngabsom ‘the pride that had seized him’ (MI. 
6171), vs. huanduaill rod[n]gab ‘by the pride that seized him’ (MI. 49°3) [‘/n]’ in both cases in- 
serted by me]. Accordingly, while Schrijver (1997b: 97, 100) differentiates between cases like 
asrect maid asmbeir do airiuc túas, ‘that it is a good law, which he says above he has found’ 
(Wb. 310), and taidbsin afirinne asber ‘it is an exposition of his righteousness which he utters’ 
(MI. 40715), as showing variation between nasalising and leniting relatives after object antece- 
dents, the second example - together with numerous others of similar structure listed in his 
footnotes (Schrijver 1997b: 97, nn. 1-2, 100, nn. 1, 3-5) - may also be taken as showing sup- 
pression of nasalisation. 
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according to Thurneysen strictly predicts a nasalising relative, the adverbial fig- 
ura etymologica, a full re-examination of the Old Irish material has already 
been carried out by Stüber (2010-2012), confirming that the nasalising relative 
is indeed compulsory, and while there are no instances of the pattern *as[m] 
beir in her collection, a couple of cases that involve infixed pronouns are wor- 
thy of note (and will be addressed below). 


2 Differentiations 
2.1 Adverbial cleft sentence 


Before addressing the adjectival cleft construction itself, it will be useful to dif- 
ferentiate it from some other patterns that are similar in form or in meaning. 
Beginning with the latter, the observed rule (see also section 3) that a de- 
adjectival adverb cannot be fronted in a cleft sentence means that the adverbial 
cleft pattern, consisting of an adverbial expression followed by a non-relative 
verb, is confined to prepositional phrases (for more information, see Griffith's 
chapter, this volume) and lexicalised adverbs,’ compare: 


(4) issamlid sin dano bid icc 
COP3sec.pres=liKessc.neur. జరం then | COPS44 ir Salvationyoy 
disi tuistiu c[h]laindde 


tO3sc.rem=3SGrem bearingyo, children, py 
‘It is thus then that the bearing of children will be salvation to her.’ (Wb. 
28°17) 


Thus, while the adverbial element is contained in the fronted item samlid ‘like 
it, thus’, the same is not permissible with, for instance, (in/co) maith ‘well’ (see 
GOI §§ 379, 381), and instead of tis (in/co) maith airlithir, literally ‘it is well that 
he cares’, the adverbial element is shifted to the relative connection itself (= the 
relative pronoun of other languages), resulting in is maith n-airlethar, literally 
‘it is good how he cares’. 


7 As well as subordinate clauses, see GOI § 814. 
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2.2 Adjective + subject complement clause 


Additionally, a fronted adjective may not only be the antecedent of an adverbial 
relative, but also of a subject clause (with the meaning ‘[the fact] that . . . °), for 
which a nasalising relative is merely an option (cf. GOI 8503), yielding a for- 
mally similar or indeed identical construction: 


(5) a. with non-relative continuation, copula: 


is follus trisodin is 
COP sc. purs Cleatyom.sc.neur through=thatycc COPssc pues 
asintsalm hodüaid d[u]uic 


out-the=psalmp,; from=Davidp,; PV-bring;uc sso pnis 
‘It is clear thereby that it is out of the psalm from David that he brings 


.. ? (MI. 25718) 
b. with non-relative continuation, stressed verb: 
is glé limm — niodigénte 


COP 3sc.pres Cleafyom.sc.neur Withise NEG-commitoy cop 
‘It is clear to me that ye would not commit. . .' (Wb. 949) 
c. with relative continuation, copula: 
isfollus doib asnoipred 
COP 35¢.pres=Cleatnom.sc.neuttO3p, COPssc pnzs. ల Workin gyom 
fir oirdnithi 
mang; appointed, 
‘It is manifest to them that it is the working of a supreme being.’ (Wb. 


1°14) 
d. with relative continuation, stressed verb: 
Is follus rundgabsat 
COP 3sc.pres Cleafyom.so.neur ~AUG-“°3SGyeurtake3py prer 
terchoiltisiu indiumsa 


thy-deterimationsyoy »-2SG inysg=1SG 
‘It is clear that Thy determinations are in me.’ (MI. 7487) 


Here, both semantic considerations and, in three cases, the intervening elements 
(underlined) make it clear that these are not an adjectival cleft ‘it is clearly/in a 
clear fashion that . . .’, literally ‘it is clear how. . . °, but non-cleft copula senten- 
ces with a complement clause as subject: ‘it is clear that . . . °. The distinction to 
be observed is thus between ‘(the circumstance) by which’ of the construction 
under discussion and ‘(the fact) that’ with a subject complement clause. In indi- 
vidual cases - particularly when the main verb is not the copula — doubts could 
arise as to which of the two constructions is intended; compare: 
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(6) is derb contoroe farao achrid 
COP3sc.pres CeltaiNyom.sc.neur PV-tUrNaye.3sc.prer Pharaohyoy his=heartacc 
do miscuis macc n israhel 


to hatredy4, children, NASIsrael,,, 

*Certainly Pharaoh had turned his heart to hatred of the Children of 
Israel.’ (Ml. 123^7) gl. bene conuertit Pharao cór suum ad odiendum popu- 
lum, quem... 


Here only semantic considerations suggest that the concept of 'certainly' (with 
an English adverb rendering the Old Irish adjective derb) is meant to qualify 
the implied superordinate verb (e.g. in **One can state with certainty that . . . °) 
rather than the verb of the associated sentence (‘had turned in a certain way’). 
Similarly, bene convertit of the Latin original can hardly stand for ‘he turned 
well’ = ‘he did well to turn’, but must mean elliptically ‘it may be well stated 


that. . . °. The following gloss appears to be more ambiguous: 

(7) ciaso demnithir 50 forcomnucuir bieid 
although-COP35¢ pres certain, thiSace PV-turNaye.3sc.prer DO3sc.rur 
aimser nad creitfider et  dosluinfider 


timeyom NEGsu;believess;;urpAss and PV-denysss.rur.pass 
‘Though it is so certainly that it has happened, there will be a time when 
it will be disbelieved and denied.’ (Wb. 28°14) 


As presented in Thes., this glosses Spiritus [autem] manifeste dicit, quia . . ., 
‘Now the Spirit expressly says that. . . °, suggesting that here, too, demnithir so 
“as certain as this’ qualifies an implied superordinate verb (e.g. in *‘one can say 
as certainly as this that . . . °), just as manifeste qualifies dicit, in which case 
*certainly' in Thes. would have to be changed to 'certain'. However, Charles- 
Edwards (1971) has shown that the full Latin context, quoted only partially in 
Thes., includes the previous sentence (1 Timothy 3:16—-4:1): 


(8) et manifeste magnum est pietatis sacramentum quod manifestatum est in 
carne et iustificatum est in spiritu apparuit angilis praedicatum est gentibus 
creditum est in hoc mundo adsumptum est in gloria[.] Spiritus manifeste 
dicit quia in novissimis temporibus discendent quidam a fide . . ల 


8 This follows the Würzburg manuscript itself (see Stern 1910: folio 2817-23, here given with 
slight normalisation, mainly concerning abbreviations and word-internal spaces), which devi- 
ates from that in the Vulgate (Weber 2007) in some minor detail only. 
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‘Great indeed, we confess, is the mystery of godliness: He was manifested 
in the flesh, vindicated by the Spirit, seen by angels, proclaimed among the 
nations, believed on in the world, taken up in glory. Now the Spirit ex- 
pressly says that in later times some will depart from the faith . . .' (ESV). 


In this, the Old Irish equative demnithir so in fact refers back to the first mani- 
feste? which qualifies the following statement directly (literally ‘and manifestly 
great is...’) and not via a superordinate verbum dicendi, meaning that the in- 
tended construction is an adjectival cleft after all, including the correct transla- 
tion ‘certainly’. As a final example, both constructions are found in: 


(9) is gnath is trom achotlud 
COP3s¢.pres USUAlyom.sc.neurtCOP3sc.pres HEAVYnomsc.masc his=sleepyom 
adi 7 is cián m bis and 


ANAPH and COPssc.pres lONZnom.so.neur Deasc nan dlssc eur 

‘His sleep is wont to be heavy and he is wont to be long therein.’ (MI. 
100710) 

(lit. ‘it is usual that. . . and it is long how. . .' / or: ‘a long time by which . . .’) 


2.3 Substantivised adjective as object antecedent 


In another superficially similar construction, the connection between the fronted 
adjective and the relative clause cannot be interpreted adverbially; rather, the 
predicate adjective serves as a substantive and is interpreted as the object of the 
headless relative clause: 


(10) isbecc rofitemmarni irrünaib de 
COP 356 pres=littlevom.se.neur AUG-knOWiy; pret/pres=1PL in=mysteriespyrp, GOdgen 
‘It is little we know in God's mysteries.’ (Wb. 125) 


9 Charles-Edwards (1971: 189) argues further that “demnithir in the Irish gloss refers to the 
first manifeste (the one not given in the Thesaurus) and compares it with the second manifeste. 
The glossator’s point is that it is just as certain a scriptural truth that Christ was incarnated 
etc. as that some will lapse from the faith. The two are equally manifest”. This, however, is not 
borne out by the text, where demnithir 50, in referring to the first manifeste as ‘as certain as 
stated above,’ does so by qualifying forcomnucuir ‘it has happened,’ without any connection 
to the second part of the gloss that alone corresponds to what follows the second manifeste. 
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The preposition in i rrünaib ‘in the mysteries’ here necessitates this interpreta- 
tion, as opposed to a putative adjectival cleft “is becc ro-fitemmar-ni rúna dé, 
literally ‘it is small how we know God’s mysteries’. The same construction is 
editorially assumed in: 


(11) hūare rombu mor dorat düaid 
because AUG-“*COP3c¢.prer DigNomsc.neur PV-givegue.ssc.prer Davidyom 
[du]lzri frit adrad su 
of=diligence,,; to=yours, worshipping-2sG 
‘because David has given much diligence to worshipping Thee’ (MI. 
136711) 


But the emendation in Thes. is not necessary if instead the manuscript reading 
is taken as an adjectival cleft, with an English adverbial translation ‘it is greatly 
that David has given diligence ...’ (Griffith and Stifter 2013). On the other 
hand, this is not possible (pace Mac Coisdealbha 1998: 155) in the following 
case: 


(12) ismó rochéess crist airi i.  báas 
COP35¢.prrs=more AUG-""suffer3;; see Christyom forssc.masc Le. deathyoy 
‘It is more that Christ has suffered for him, to wit, death.’ (Wb. 6*8) 


The main sentence as far as airi could be understood as an adjectival cleft with 
adverbial meaning, ‘it is more greatly that Christ has suffered for him’, but only 
if an innovatory leniting relative is admitted (cf. GOI 8506). This, however, 
would leave the added explanatory object báas out of construction. Therefore, 
Sims-Williams (1984: 193, albeit without quoting .i. báas in support), is right in 
understanding mó “substantivally as object antecedent ... ‘it is a greater 
(thing) (more) that Christ has suffered for him’” (see also Griffith’s contribution 
in this volume). 


(13) bid mó dongénaesiu oldaas rofoided cucut 
COP3sc.rur more PV." $do;...,,-2sG ‘than’ AUG-sendase.prer.pass tOrs¢ 
‘Thou wilt do it more than has been sent to thee.’ (Wb. 32°25), 
gl. sciens quoniam et super id quod dico facies (Philemon 21) ‘knowing 
that you will do even more than I say’ (ESV). 


The translation offered in Thes. presupposes reading dongénae as don[d]genae 
with an infixed pronoun. However, Pedersen (1899: 391) suggests that “dagegen 


202 —— Jürgen Uhlich 


gehórt bid mó dongenaesiu 32 a 25 eher in 871 [however, bid mó dongenaesiu 
32°25 belongs rather to 8 71]", referring to page 392f. where he deals with nasal- 
ising relatives connecting to an object antecedent. That this — ‘thou wilt do 
more... — is the correct interpretation is proven by the Latin context quoted 
above. 


3 Validity of a rule excluding de-adjectival 
adverbs from fronting 


A few glosses superficially give the impression that in them, a de-adjectival ad- 
verb is clefted, and they now need to be addressed individually. 


(14) ba infortgidiu 7 ba hitemul 
COPssc pug; thepar.se.nevr=COVEFtpar.sc.ncur and COPss; 4; in=darknesspy+ 
dugnith saul 
PV-dO3sc.mpr Saulyoy 
‘it was covertly and it was in darkness that Saul ... used to make...’ 
(MI. 30?3) (for infortgidiu the manuscript has imfortgidiu) 


If extracted and viewed in isolation, ba in [sic leg.] fortgidiu du-gníth would in- 
deed constitute an adverbial cleft with a non-relative verb — for predicted “ba 
fortgide du-ngníth ‘it was covert how he used to make’ - but with the actual 
pairing of two diverse fronted elements, the continuous phrase hi temul du-gn 
ith is a normal adverbial cleft beginning with a prepositional phrase, and with 
such mixed fronting, the construction agrees most naturally with the second 
phrase hi temul, and the first phrase has been secondarily adapted to suit this 
syntactic context. For a similarly mixed fronting construction, compare Ml. 4179 
in (88) below. 


(15) a. non dificulter?" ” eueniat®" “2; 
i. ní baindodaing 
ie. NEG COP 35¢.ur-th€par.se.neur=Cifficultpa7.s6.neur 
‘i.e. it will not be with difficulty’ (MI. 61°21) 
b. dufórban 
PV-happenssc.pnzs 
‘it happens’ (MI. 61°22; author's trans.) 
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Here one might expect that the connected Latin phrase non dificulter eveniat, ‘it 
may/will not happen in a difficult way’, were explained by an equally unified 
gloss, and in that case, in dodaing’® would be a fronted de-adjectival adverb, 
and the whole sentence would stand for predicted *niba dodaing du-forban 
(with nasalising relative). However, in the manuscript there is a clear space be- 
tween indodaing and dufórban, with gloss 61°21 being almost exactly coexten- 
sive with the Latin phrase it explains and 61222 only beginning over the second 
part of the u of eueniat. Therefore, these are indeed separate glosses, and the 
adverb in dodaing renders dificulter in isolation, not as the first part of a cleft 
sentence. The same is even more clearly the case in: 


(16) multum .i is indil asferr 
much i.e. COP3sc.rur theparse.neur=MANYpar.sc.neur COPssc pes a7 Detter 
iudeus quam gentilis 
Judaeus than Gentilis 
*multum i.e. it is greatly that Judaeus is better than Gentilis.' (Wb. 2a4) 
gl. multum per omnem modum (Rom. 3:2) *much in every way' (ESV) 


As Thurneysen remarks on this isolated example, “the construction seems un- 
Irish" (GOI 8383 n.). While the combination of a fronted adverb with a relative 
verb could be justified as an incipient innovation (for which see GOI 8 506), the 
adverbial formation ind il itself (from il ‘many, much’, with ind as described in 
GOI 8379) is entirely unparalleled. Instead, in order to express the concept of 
‘greatly’ in this construction, “is mór as ferr might be expected — compare the 
material collection in section 4.2 below that does not include cases of the 
fronted positive mór, but, in many instances, the corresponding comparative 
mo ‘more’ instead. Rather than representing a natural Irish expression, then, 
ind il is best explained as a mechanical rendering of the Latin adverb multum, 
and to the extent that this is an artificial process, a correspondingly artificial 
translation *nuchly' may be proposed. This analysis ties in well with the more 
general observation that the unmarked Old Irish equivalent of what would be 
de-adjectival adverb formation in other languages is precisely the adjectival 
cleft under discussion, thus most clearly with comparatives and superlatives, 
for which direct adverbs like indluindiu (MI. ౩21, ‘more angrily’, glossing com- 
motius) or inmáam (Wb. 1°20, ‘most greatly’, glossing primum), “are never 


10 Griffith and Stifter (2013) take indodaing instead as containing the preposition in, but since 
dodaing is an adjective (albeit as such capable of substantivisation) and dificulter an adverb, 
direct equivalence of indodaing to the latter is more likely. 
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found in a clause, but occur only as isolated glosses, the language of which is 
probably somewhat artificial" (GOI 8382). Against this background, Mac 
Coisdealbha (1998: 156) suggests that: 


such a situation obtained in part also for the non-comparative adverbial derived from the 
adjective, i.e. that it was expressed as a fronted element in the COP. EMPH. construction. 
. . . This suspicion is strengthened by the general paucity of such ind-derivatives in the 
Würzburg period especially in complete clauses (as isolated translations of Latin adverbs 
they are more frequent). 


In the present case, the only difference is that such a gloss on a Latin word in 
isolation has been embedded unchanged into the adjectival cleft structure. The 
exact same mixed construction is found, with an embedded Latin adverb, in: 


(17) níbbu machdad tra bed figurate: 
NEG-COP3sc.prer. Wwonder,o, then COP3sc.rsr.sus; figuratively 
nombed a: uirgo filius asbeir hieronymus 


PV-*Sbe356.psr.susy the uirgo filius PV-Say3sc.pres Jerome 
‘and it were no wonder then that uirgo filius that Jerome speaks of, was 
figuratively’ (Sg. 62^2)! 


On the other hand, an ellipsis of the natural Irish construction, i.e. even with- 
out the following main verb, may be seen in: 


(18) imgabaid  etbadtreit aris huisse 
Shunoy, wey; and-COPss;py-quickyowsc eur for-COP3s¢.pres PFOPCTNom.sc.rem 
aimgabdáil. 


its=shunningyom 
‘Shun ye and let it be quickly, for it is proper to shun it.’ (Wb. 946) 
(gl. fornicationem fugite) 


for which the complete expression of the second part may be predicted as *bad 
treit imme-n-imgabaid, ‘let it be quick how you shun (it)’ (while conceding that 
formally, treit could be either adjective or adverb). 


11 The adverb figurate was likely obtained from the wider context of this passage (Thes. 2: 
116.1 = Hertz [1855-1858] 2009, 1: 145.20). It is used shortly before (at Thes. 2: 115.16 = Hertz 
2009, 1: 145.15). 
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4 Old Irish corpus of adjectival cleft sentences? 
4.1 Fronted cian 


To begin with, a separate section is devoted to cian merely because in this con- 
struction, it is impossible to decide if cian is used with its adjectival value 
‘long’ or in its equally common substantival function ‘a long time’ (compare 
also [89] in section 5.4 below). 


4.1.1 With overt spelling of nasalisation 


(19) niba cián m bete oca cloinib 
NEG-COPssc ru; 1ONNom.sc.neut TH Desc Sanit at-their wickednessesy,; y; 
‘They will not be long at their wickednesses.' (MI. 28*10) 


(20) ni ba cian mbias in pecthach 
NEG-COPssc ru; lOnyom.se.neur NSbessc FUT.REL theyomsc.masc Sininetyoy 
‘The sinner will not abide long.’ (MI. 56°22) 


(21) niba cián m bete and 
NEG-COPssc gur 1ONyom.sc.neur NASbear; FUT.REL iN3g¢.neur 
‘They will not be there long’ (MI. 66°14) 


(22) is cian m bis and 


NAS ; 
COP 35c.pres lONSNom.sc.neur be3sc.uaz Ínasc.ngur 


‘He is wont to be long therein.’ (Ml. 100710) 


4.1.2 Orthographically*? ambiguous regarding nasalisation 


(23) iscián arfolmas dún insin 
COP 3¢6.pres=lONSyomsc.neur PV-undertakessc.prer.pass fOrip; theyom.sc=thatyom 


12 The following collection is intended to be complete for all sources edited in Thes., amount- 
ing in the main to the Wiirzburg, Milan and St. Gall Glosses. 

13 In what follows, a distinction is made between orthographically ambiguous for cases in 
which a nasalisation, if present, would have been audible - such as in ar-folmas in (23), where 
«f» could represent either unlenited f /f/, lenited f /@/ or nasalised f /v/ —, and phonologically 
ambiguous for cases in which a nasalisation could not have affected the stressed anlaut in pro- 
nunciation beyond non-lenition — such as the r of do-réracht (24). 
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‘It is long since that has been destined (has been imminent) for us." 
(Wb. 21?2) 
For relative ar- in (23) rather than ara-, see further under (90) below. 


4.1.3 Phonologically ambiguous regarding nasalisation 


(24) is cian doréracht Emain 
COP3sc¢.prrs lONNomsce.neur PV-abandonsgg.prer.pass Emainyow 
‘Long since has Emain been forsaken.’ (Thes. 2: 317.6 and 317.15 [Hymn ii]) 


4.2 With comparative (and equative or superlative) 


According to Thurneysen (GOI § 383), the adjectival cleft sentence “is the nor- 
mal construction with adverbial forms of comparison” - albeit in a somewhat 
condensed expression for “replacing” or “corresponding to adverbial forms of 
comparison in other languages”, since within the Irish construction itself, only 
the basic adjectival forms (i.e. those not overtly marked as adverbs) may be 
used. Compare similarly Mac Coisdealbha’s (1998: 156) description: “The com- 
parative and superlative attributive adjectives and corresponding adverbs must 
be formed predicatively with the copula.” On account of this observation, a sep- 
arate section is here dedicated to fronted degrees of comparison, and most of 
the extant examples involve a comparative. 

Commenting on the basic ‘is maith construction,’ Sims-Williams (1984: 193) 
remarks further: “Note, however, that a nas. rel. clause is not regular in the simi- 
lar constructions with a comparative (MI. 22°14 = [102] below) or a superlative, in 
fer as deg do-cheil bile, ‘the man who best hides a tree’, Thurneysen 1946: 322 
(cf. 681 n. 126) - on the Welsh construction which Thurneysen compares see 
P. Mac Cana, Celtica 7 (1966) 91-115." The two examples adduced, however, are 
not parallel. While the second, construed superlative case illustrates a process of 
syntactic raising of the second relative clause to the level of the first — a process 
that will be addressed, with some real examples, in section 5.2 below — MI. 22°14 
begins with air is mou ru-icim les. . . ‘for itis more that I need. . .' and thus with 
a non-relative copula that does not deliver a context for raising. Instead, this is to 


14 Following the translation in Kavanagh (2001: 103 s.v. ar-folmathar), vs. ‘it is long since he 
destined (?) that to us' (Thes. 1: 631), but see GOI (8708 note) on the analogical spread of the 
third person singular passive preterite ending -s. 


8 Nasalising relatives after adjectival antecedents —— 207 


be recognised as one of the few, innovatory exceptions to the nasalising relative 
rule in the adjectival cleft construction, see section 6.2. 


Apart from such exceptions to be discussed further below, the attested 


cases involving fronted degrees of comparison are: 


4.2.1 With overt spelling of nasalisation 


(25) 


(26) 


(27) 


(28) 


a. isléir dorigni indalalestar 
COP 35¢.pres=Carefulyom.sc.neur PV-dOaue.3sc.prer One.of.two=vessel ace 
‘It is carefully he has made one of the two vessels.’ (Wb. 4°32) (= [46a] 
below) followed by: 
b. isdinnimu dongni alaill 
COP 35¢.pres=Catelesscoyp PV." 5doss; is Otheracc 
‘It is more carelessly that he makes the other.’ (Wb. 4°33) 


condibferr donberaidsi oldaas 
so.that-COPssc pres.susy=better PV-“**3SGyrurSiVzpt.pres.supj=2PL ‘than’ 
cach 

anyoneyom 

‘that you may give it better than anyone (else)’ (Wb. 169) (donberaid for 
do-nd-beraid; see Thes. 2: 477) 


ismóa dongnísom oldaas 
COP 3s¢.pres=more PV-%"*3SGyqsc*dO3so.pres=3SGneur ‘than’ 
dontlucham 


PV-"^53sG, or aSKip pres 
‘He does it more than we ask it.’ (Wb. 21*9) (dongní . . . dontlucham = don 
[d]-gní . . . don[d]-tlucham) 


Corrop mooassamoo et  corrop 
so.that-COP auc.3sc.pres.sus) Imore-and-more and so.that-COPave.3sc.prss.supy 
ferrassaferr donimdigi[d] desseirc dé 


better=and=better PV. ^5multiplyos pres.susy LOVEacc GOdcrn 

et comnessim 

and neighbour; 

“50 that more and more, and so that better and better, ye may abound in 
love of God and of neighbour.' (Wb. 23^1) 
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(29) 


(30) 


(31) 


(32) 


(33) 


(34) 


—— Jürgen Uhlich 


combad mou de donadbastae molad 
that-COP3sc.psr.suss mOre Ofssc rur PV-"5showsc psr supjpass 10121560 

dé triachaingnímu 

Godgrn through-his-good.deeds,«. py, 

‘that the praise of God might be more shewn (sic) forth through His good 


deeds’ (MI. 37°23) 


ní lugu asnindet lāthar innandüle 
NEG less PV-"Sshows,.,., dispositionyoy thegen.prrem=elementS gen. p. 
dodia 7  nundfoilsigedar indáas 


to=Gody,, and PV-"V5$3sc,.-manifest34; prrs ‘than’ 
‘not less does the disposition of the elements set forth concerning God 
and manifest Him than . . .' (Ml. 42°18) 


combad mou de nongabtis inna 
NAS 
that-COPssc »s sug; MOre Ofssg.neur PV." S takespr.psr.susy theacc.pr.neur 
forngaire 
commandSacc.p, 


‘that they might the more receive the commands’ (MI. 53°13) 


is mou dundrigensat indaas 
COP 3cc.pres more PV-™*3SGyeur'dOquc.spr.prer ‘than’ 
didrairlécissiu 


PV-3SGygur‘PerMityuc.2sc.prer=2SG 
‘They have done it more than Thou hast permitted it.’ (Ml. 87°8) 


cesu meinciu aranecar... arecar 
although-COPasc pages OfteNcomp PV pan finds. sores pase PV-find35c.pres.pass 
dano cid so indhuathad 

yet even thisyoy then,s.sc.ngurTTaTépar.sc.nNEur 

‘although it is oftener found ..., yet even this is found rarely ...’ 
(Sg. 137°2) 


With equative: 

islérithir inso nonguidimse 

COP 356.pres=ZCalouSeq theaccsg=thisace PV-“*beseechisc, pres=1SG 
dia nerutsu amal 

God,cc fOrsg=2SG as 

‘I beseech God for thee as urgently as . . .' (Wb. 27719) 


8 Nasalising relatives after adjectival antecedents —— 209 


(35) With equative: 


is soirbidir sin forndengatsom 
COP 3g6.pres easy; జరం PV-“Soppress3pr pres=3PL 
inni bis 


theacc.se.masc=DEICT ace Desc nap.nrt 
‘even so easily do they oppress him who is . . . (MI. 757) 


4.2.2 Orthographically ambiguous regarding nasalisation 


4.2.2.1 Relativity marked otherwise 

(36) ba mmo immefolngitis bron | damsa 
COP3sc.prer more PVgjj4:causes3s np; griefacc tOjg¢-1SG 
‘[They] used more to cause grief to me.’ (MI. 8646) 


(37) ní lugu immefolngi sonartai do neuch 
NEG less PVgj4:causessspsss Strengthacc to  someonep,r 
incotlud indaas 


theyom.sc.masc=SleePyom ‘than’ 
‘not less does sleep produce strength to a man than. . .' (MI. 13513) 


4.2.2.2 Relativity otherwise unmarked 


(38) doadbadar hic bríg inna persine 
PV-shoW3sc.pres.pass here mightyom thecrn.sc.rem person, 
dodiccfa asmó de focíaltar 


PV-3SGyggr:COme3sc, rur. COP3sc.pres.ne = MOTE Ofssc.neur PV-expectssc purs. pass 
‘Hic is shown the might of the Person that will so come, who is the more 
expected.’ (Wb. 29°4) (but see also as [86] below) 


(39) istraitiu adcotar fortacht dæ 
COP 35¢.pres=Quickcomp PV-obtaingsc.rres.pass Helpyom GOdgen 
‘the help of God is more quickly obtained . . .' (MI. 92°9) 


(40) combad mou de nocrete són 
that-COPss; »srsug more Ofscg.neur PV-believesse.psr.suzypass ANAPHyom 
‘that it might be the more believed . . .' (Ml. 11124) 


210 —— Jürgen Uhlich 


(41) is deniu adciam hüasülib risiu 
COP3sc.pres Quickcomp PV-Sseejp pres from=eyeSparp, before 
rocloammar infogur hüachlüasaib ut est 
AUG-heatyp, pressus theacc.sc.masc=SOUNA,cc With-earsp,;,, that is 
is toisigiu adciam teilciud in bela  resiu 
COPasc pas fitStcomp PV:Sseej pags throwingacc theggiscwasc aXegg, before 
rocloammar a guth sidi 


AUG-hearip:pres.susy its soundycc ANAPHag, 

‘we see more quickly with the eyes before we hear the sound with the 
ears, ut est, we see the throwing of the axe before we hear the sound of it’ 
(MI. 112012) 


(42 ni | moa adcosnat 
NEG more PV:strivess pres 
*(they) do not strive more . . .' (Thes. 2: 6.29 [Carlsruhe Augustine 121]) 


(43) With superlative: 
fib as deg ropri[d]ched^ 
aS COP3sc.pres.ne best AUG-preach3sc.prer.pass 
‘as it hath been preached best’ (Wb. 23°3) 


4.2.3 Phonologically ambiguous regarding nasalisation 


4.2.3.1 Relativity marked otherwise 
(44) is moo sluindes pronomen persin quam 


COP3sc.pres more signifyssc.pres.re pronoun persona, than 
aliae partes 


15 Strictly speaking, it has yet to be demonstrated if «p» in a nasalising context is ambiguous 
merely in orthography (= /b/) or also phonologically (= /p/). Caution in this regard is sug- 
gested by the behaviour, or perhaps merely representation, of p under lenition, where it is, 
or appears, "sometimes lenited, sometimes not . . . Evidently the process, which had devel- 
oped by analogy with the other stops, particularly with b: f, had not yet become universal" 
(GOI § 231.5) for this originally foreign sound/letter. If the background to this is not merely 
graphical (cf. McManus 1983: 48, n. 63), but phonological, p may have shown similar hesita- 
tion initially to undergo nasalisation (a possible early instance of the marking of nasalisation 
may be seen in the doubling in ippennit, ‘in penance’ (Thes. 2: 247.8 [Cambrai Homily])). 
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other parts 
‘The pronoun, more than the other parts of speech, signifies a person.’ 
(Sg. 197711) 


4.2.3.2 Relativity otherwise unmarked 

(45) amal as trummu forlūadi hisuidi 
as COP 3sc.pres.re. heaVycous PV-SWayssc pus IN=ANAPHpar 
‘as it sways more heavily therein’ (MI. 7955) 


(46) a. isléir dorigni indalalestar 
COP 3¢¢.prrs=Carefulyou.sc.neur PV-dOque.3sc.prer One.of.two=vessel ace 
‘It is carefully he has made one of the two vessels.’ (Wb. 4°32) 
(followed by [25b], repeated here as [46b]) 
b. isdínnimu dongní alaill 
COP 35¢.pres=Carelesscomp PV." *doss; aes otheracc 
‘It is more carelessly that he makes the other.’ (Wb. 4°33) 


4.3 With positive adjective 
4.3.1 With overt spelling of nasalisation 


(47) nibu degming donet[h]adsom'? achorp 
NEG-COPasc se; difficultyoy sc. cur PV." gc psr sus] JSGyasc his-body,cc 
fadesin issuidiu 
own in=ANAPHpar 
‘It was not difficult for him to go to his own body then.’ (Wb. 13°20) 


(48) nicumung donindnagar arforcital duib 
NEG=straightyom.sc.nrur PV-““Sbestow3sc.pres.pass OUF=teachingyoy 102: 
‘Not straitly [sic Thes.] is it that our teaching is given to you.’ (Wb. 16711) 


16 Assigning this form to do-etha ‘goes to, visits, approaches’. On the other hand, Thes. 1: 726 
(addendum 588) reports “MS. donecadsom (‘the he should see’), Chroust,” i.e. from do-éccai ‘looks 
at, beholds, sees.’ Finally, Kavanagh (2001: 348) takes the lead from Pedersen (1909-1913, 2: 514) 
in positing a verb do-éta ‘clothes’ (albeit intended by Pedersen implicitly as subj. only), i.e. ‘that 
he should clothe his own body therein’ (cf. induere in the Latin context). The principal point in the 
present context, the relative nasalisation, remains unaffected. 
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(49) 


(50) 


(51) 


(52) 


(53) 


(54) 


(55) 


—— Jürgen Uhlich 


nibrónach donintarrái 
NEG=sadyom.se.neur PV-™Sreturnauc.3sc.pRer 
‘It is not sadly" that he has returned.’ (Wb. 16°18) 


niba uaithed dondriga 
NEG-COPssc rur f€Wyom.sc.neur PV- NA53SG,icu 7" COIDE sso pur 
*It will not be with a few!? that he will come.’ (Wb. 25°38) 


iseicrichnichthe donindnigsom 
COP 3¢.pres=UNlimitedyou.sc.neur PV: ““Sbestowsg.pres=3SGmasc 
adagmóini 


his-benefits,cc p. 
‘It is unlimitedly that He bestows His benefits,’ (Wb. 28°17) 


arndip maith nairlethar a muntir 
so.that-COP 356 pres.susy ZOOdnom.sc.neur NAS CATE35G.PRES.SUEJ his household, 
‘that he care well for his household’ (Wb. 28°32) 


mad ain[m]netach | fondamtar inna 

: : NAS, 

if-COP3>¢.pres.susy Patientyom.sc.neur PV- "Suffersp, pres. supypass thenompr.neur 
imneda inbetha frecndairc 


troublesyow p», thecen.sc.masc=WOtldcen presentgen 
‘if the troubles of the present world be borne patiently’ (MI. 466) 


airis menic dondecmaing, 
for-COP3s¢.prrs OfteNyom.sc.neur PV- M53sGycuhappenssc pres 
‘For it often happens thus.’ (MI. 54?7) 


amal as trait forndiuclannar ade 
as COP 3sc.pres.rer QUICKyom.sc.neur PV: NASdevourasc. purs. pAss ANAPH 
‘as it is quickly devoured’ (MI. 104^5) 


17 The interpretation as ‘sadly’ (implying the literal connection ‘sad how’), rather than ‘sad’ 
(literally ‘sad that’; see section 2.2), is confirmed by the Latin context: abundantius magis 
gauisi sumus super gaudio Titi, quia refectus est spiritus eius ab omnibus uobis (2 Cor. 7:13), *we 
rejoiced still more at the joy of Titus, because his spirit has been refreshed by you all’ (ESV). 

18 Greene (1971) has shown that the DIL entry 1 üathad/óthad/üaithed is generally and origi- 
nally an adjective meaning ‘few’. 


(56) 


(57) 


(58) 


(59) 


(60) 
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is dian immamberat acossa oc ind 
COP3sc.prers SWiftyom.sc.neur PV ger 5plyss; pres their=feet,ccp, at the 
figi 

weaving par 

‘they ply their feet swiftly in the weaving.’ (MI. 111°17) 

coru[p]léir dungné nech 
that-COP yuc.3se.pres.supy=Carefulyom.se.neut PV-"^*dos.c purs sup; someoneyom 
inpreceupt 

theacc.sc.masc=teaching acc 

‘that each one may diligently do the teaching’ (MI. 129^1) 

issain donadbantar sensibus 

COP 35¢.pres=Cifferentyou.sc.neur PV" ShOWsc.pres.pass Sensibus 
‘Differently is it shown sensibus." (Thes. 2: 4.32-33 [Carlsruhe Augustine 
10°2]) 

is bec nand sinunn 

COP3sc.pres littleyom.sc.neur NEG; 5COPsss purs SaM€yom.sc.NEUT 

andéde nisiu 

theyom.se.neur=tWOyoy DEICT_this 

‘These two (explanations) are nearly the same.’ (Sg. 76?3) 

isdilgen doneprinn 


COP 356. pres=SeNtleyom.sc.neur PV-"Pflowssc ves 
‘Gently it flows.’ (Sg. 145°4) 


4.3.2 Orthographically ambiguous regarding nasalisation 


(61) 


(62) 


is sonairt atreba ni 
COP 35c.pres fitMyom.sc.neur PV-dwellsscpres somethingyoy 
clantar 


plant3sc.pres.pass.REL 


‘What is planted dwells firmly.’ (Ml. 63°9) 


combad ellam nocomallaitis aní 
that-COPsscpsrsus SPeedYyom.sc.neur PV-fulfilssc.psr.susy theacc.s.neur=DEICT 
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asrochoilset 
PV-determine,uc.3p. pner 
‘that they should speedily fulfil what they had determined’ (MI. 952) 


(63) commixtum interpretatur .i. cummascdae adfét 
commixtum interpretatur i.e. mixedyomsc.neur PV-tellasc.pres 


in salmso di buaid innam 
theyom.sc.masc=PSAlMyom=PROX of victoryp; thegenrrmasc 

babelóndae 

BabyloniansSo pi wasc 

‘i.e. this psalm speaks mixedly of the victory over the Babylonians.’ 
(MI. 1159) 


4.3.3 Phonologically ambiguous regarding nasalisation 


4.3.3.1 Relativity marked otherwise 


(64) inbec máo .i. isbec as 
the(?)=littleyom.sc.veur more i.e. COP35¢.pres=littleyomsc.neur COP33sc.pres.ret 
máo oldáusa i. is bec 
more ‘than’;,=1SG i.e. COPasc aes littleuoy sc asc 
inderscugud, 


theyow.sc. masc distinctionyoy 

‘A little greater i.e. she is a little greater than I, i.e. the distinction is 
small.’ (Sg. 45715) 

(gl. paruo maior’? in paruo maior quam ego, ‘a little greater than I.’) 


In view of the standard teaching on the formation of adverbs,? Stokes and 
Strachan (Thes. 2: 99 n. c) wonder if for inbec “leg. inbiuc, or is becmáo a com- 
pound?”. Neither the emendation, however, nor the assumption of an unparalleled 
compound are necessary, if one (i) derives the element in(d) in adverbs not from 
the article, but from the preposition/preverb ind(-),” and (ii) takes account of the 


19 In the manuscript, the gloss begins above the p of paruo and continues beyond maior into 
the empty space to the right of the column, whereas quam ego begins the following line. 

20 "To form an adverb, the dat. sg. of the adjective preceded by the article - or at all events by 
a word identical in form with the article — is generally used” (GOI § 379); cf. Mac Coisdealbha 
(1998: 155-157). 

21 As considered in GOI (8379 note) and first argued by Morris Jones (1913: 439), cf. further 
Vendryes (1928), including his comment on the related Old Latin preposition endo that "elle 
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fact that the dative form is not found with all cases of this formation.” Examples 
of what instead must be the accusative are listed in GOI (§ 379) itself, such as: 


(65) a. indoll ‘amply’ (Sg. 220?6; author's trans.), gl. ultra ‘beyond’, rather 
than *ind ull—for the expected raised vowel cf. the comparative huilliu, 
e.g. Sg. 70°6; 
b. inmade, inmadz ‘in vain’ (Wb. 19^10, 19°16) gl. sine causa ‘without 
cause'—contrast the dative mudu (Wb. 1624); 
c. ind immdae ‘abundantly’ (Sg. 26?5), gl. examosin?—"beside normal” 
(GOI 8 379) indimdu (MI. 35°5), gl. passim ‘in every part’. 


For forms like ind oll and ind immdae, the article is clearly ruled out, since its 
accusative singular (masculine/feminine) form would be in n-. Instead, the pos- 
sibility of accusatival/directional adverbs (including with prepositions taking 
the accusative), beside datival ones, is further supported by the alternative for- 
mation with co ‘to’ (GOI 8381) — consider further the English alternative be- 
tween ‘in a certain way’ and ‘to a certain extent’. Accordingly, the preposition 
in(d) could take either the dative or the accusative,” and in bec is to be taken 
as a regularly formed adverb - as an alternative to datival inbiucc ‘in small 
measure' (e.g. Sg. 39?25) — glossing paruo in isolation (followed by máo for 
maior), before the entire Latin phrase paruo maior quam ego is rendered in 
more natural Old Irish by the adjectival cleft sentence is bec as máo oldáu-sa. 


4.3.3.2 Relativity otherwise unmarked 
(66) nirbu faás foruigéni 
NEG-COP que.3sc.prer CMPt¥nomse.neur PV-SeIVe4uc.3sc.pner 
‘Not void has been his service.’ (Wb. 137) 
gl. et gratia eius in me uacua non fuit (1 Cor. 15:10) ‘And his grace toward 
me was not in vain.’ (ESV) 


devait admettre aprés elle le datif aussi bien que l'accusatif [it must have taken after it the 
dative as well as the accusative]” (Vendryes 1928: 78) and Lambert (1995: 174—175), the latter 
assigning the preposition ind the meaning ‘en direction de, contre [towards, against]'. 

22 A fact that is not mentioned by Thurneysen in GOI despite being reflected in his collection 
of examples, or by Lambert, while some cases were at least pointed out in Thurneysen (1909: 
8378). 

23 Recte examosim (Sg. 268), for examussim ‘according to a rule or measure, exactly, regu- 
larly, perfectly’ (Glare [1968-1982] 2012: s.v. examussim). The gloss, therefore, is “probably 
guesswork, concluded from the context" (Hofman 1996, 1.2: 115). 

24 Like its allomorph i n- etc.; see GOI (8 842). 
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(67) 


(68) 


(69) 


(70) 
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isdían dorreractid maám 
COP 35¢.pres=SWiftnomsc.neur PV-abandonauc2pi pner YOKeAcc 
indsoscéli 


the grn.sc.neur=S0SPelgen 
‘It is swiftly that ye have abandoned the yoke of the gospel.’ (Wb. 186) 


isimde dorrindnacht dun 
COP35¢.prrs=abundantyomsc.neur PV:bestOWaue.3sc.prer.pass 1011 
‘Abundantly it has been bestowed upon us’ (Wb. 20715) 


is cosmail disin dano asrobrad 
COP3sc¢.pres Similatyom.sc.neur from=thatp,r then PV-Sayauc.3sc.prer.pass 
‘similarly then . . . was applied . . .' (Ml. 37°24) 


nant maith oroitatar 
NEGsus-COP3sc.pres ZOOdyom.sc.neur PV-guard ue. pr pnr 
arríg 


their-king,cc 
‘that they did not guard their King well’ (MI. 55*1) 


As an interim summary, it can be stated that all examples of the adjectival cleft 
construction adduced so far — which constitute the vast majority — either show 
a clearly marked nasalising relative verb or are orthographically/phonologi- 
cally compatible with it. 


5 Apparent exceptions to the nasalising relative 
construction 


5.1 Non-class C infixed pronouns 


(71) 


ní maith domrignis 
NEG go00dyom.sc.neur PV-1SG:dOquc.2s¢.pret 
‘Not well hast thou made me.’ (Wb. 4°27) 
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(72) nípadro(mór)? notbocctha 
NEG-COPasc psr.sunj=t00.greatyom.sc.neur PV-2SG-mOVe»sc, psr. sup; 
‘Thou shouldst not boast overmuch.’ (Wb. 5°32) 


(73) menicc atchíth hi físib / 
Oftenyomsc.neur PV-3SGygr Sees pe in VisiOnSpr pi 
dosnicfed afrithissi 


PV-3SGrem “comes can again 
‘Often he used to see in visions that he should come to it again.’ 
(Thes. 2: 312.4 [Hymn ii]) 


In ascertaining if, in the adjectival cleft construction, a nasalising relative main 
verb is compulsory (as indicated in GOI 88 383, 505), the first observation con- 
cerning the examples above is that no nasalisation, or indeed relativity, is 
marked in them, and dom-rignis and not-bocctha”® could serve unchanged to in- 
troduce a main clause, such as described, for instance, in GOI (8 505) as an alter- 
native to most other nasalising relative constructions. Accordingly, Pedersen 
(1899: 414) remarks on (71) that “das . . . zu erwartende relative n fehlt [that . . . 
for which the expected relative n is missing]”, while Mac Coisdealbha (1998: 155) 
similarly points to the fact that (72) *do[es] not show nasalization" as an argu- 
ment against the presence of a nasalising relative. Moreover, Isaac in Mac 
Coisdealbha (1998: 257) compares (72) directly with (100) below: 


As for [these] two [examples], the fact is that the opportunity for nasalization is there. 
5b32 [= (72)] could have shown nasalization if it had contained the Class C infixed 


25 Vs. "nípadruo-, worauf zwei buchstaben etwa abgerieben wurden, Chroust [nípadruo--, 
after which about two letters have been erased]" (Thes. 1: 725 [addendum 528]). Previously, 
Strachan (1899: 42) had suggested reading nípa[d] dron . . . . The images both in the facsimile 
(Stern 1910) and online (http: ; 
compatible with reading nipadro (with a gap between d and r) and at least one more minim 
(such as the beginning of an n or m). The anonymous reader cautions that there is hardly 
enough space for ro[mór] in the manuscript and suggests that the intended adverb may be rom 
‘early, too soon’, and furthermore that rather than taking boccaid plus reflexive infixed pro- 
noun with the unparalleled meaning *boasts', one of the more common meanings 'softens' or 
‘moves’ (DIL) may be intended - thus implying, for instance, a literal ‘that it not be too soon 
that you move yourself’. 

26 Since (73) is not preserved in a contemporary Old Irish manuscript, the infixed pronoun in 
at-chíth can be taken either as expressing Old Irish prolepsis (cf. GOI 8 421) or as showing (in 
this case scribal) Middle Irish petrification of neuter pronouns (cf. McCone 1997: 172-173). In 
the latter case, the Old Irish original might have had non-proleptic *ad-cíth, with c- = /g/ mark- 
ing the expected nasalising relative. 
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pronoun: *nípadromór nondatbocctha. But it is perfectly regular for Class A, which cannot 
realize nasalization, to appear in this position, GOI § 413.2. 13a29 [= (100)] could have 
read *badféal et badfedte dongneid cachréit. But it does not. In both cases, the opportu- 
nity to nasalize was not taken. Thurneysen's rule can, then, at best be taken as a faculta- 
tive formal strategy. 


Pace Isaac, however, his two examples represent two different types of excep- 
tion. In bad fedte do-gneid, the opportunity to nasalise is indeed there and was 
not taken, meaning that this and some similar cases constitute real exceptions 
to the predicted nasalising relative construction and will be addressed as such 
in section 6. In cases like (72), on the other hand, that opportunity would have 
had to be created first by switching the infixed pronoun from class A to class C, 
and it is the opportunity of marking the relative by using class C that was not 
taken, not that of applying nasalisation, which is formally impossible with 
class A. In order to assess the latter type of exception, therefore, it will be nec- 
essary first of all to review the use of infixed pronouns in relative clauses. 
Thurneysen (GOI 8 413.2) observes that in relative verbs, class C "regularly 
replaces the pronouns of class A in the third person only; but it is frequently 
(though not invariably) used instead of the first and second persons of A and 
all the forms of B". From this it follows — as is in fact conceded by Isaac himself 
above - that in a relative context, a form with a class A?? or B pronoun is to be 
considered just as relative in status as one with class C, albeit without any 
overt marker of relativisation, which can only be expressed - by lenition or na- 
salisation — on the -d- of a class C pronoun (for more discussion on some of 
these points, see García-Castillero's contribution to this volume). To test this, 
one needs of course to rely on constructions that are unambiguous in requiring 
a relative verbal form, and the clearest, and entirely undisputed, case in this 


27 The same direct comparison is implicit in Ó hUiginn's (1986: 58) more complete collection 
of formally deviant examples from the glosses, comprising, on the one hand, (71) and (72), as 
well as, on the other, arachrin in (96) and dogneid in (100) and some others of the same nature 
(on which see further sections 6.1 and 6.2 below), leading him to summarise that *it would be 
more accurate to see the use of the nasalizing relative in such cases as more of a normal cus- 
tom than a fast rule". Note, however, that while Ó hUiginn (1986) classifies verbs with non- 
Class C pronouns as non-relative throughout his collections (e.g. p. 44), his reference (p. 67) to 
“instances of a class A inf. pron. . . . being retained in a rel. clause" implies - in my view cor- 
rectly - that a non-class C pronoun alone cannot serve to prove that the verb is non-relative. 
28 Compare Strachan (1903a: 67, n. 3): "It does not seem to have been noted that, when the 
short forms of the infixed pronouns of the first and second persons appear in relative use, rela- 
tive -n- is not inserted before them." 

29 While lenition of d cannot be marked within the Old Irish orthographical system, its pres- 
ence can be deduced from the parallel relative nasalisation yielding written -nd- (GOI 8 504 [b]). 


8 Nasalising relatives after adjectival antecedents —— 219 


regard is the leniting relative following a subject antecedent. The first two ex- 
amples below, (74) and (75), illustrate that while a class C pronoun is admissi- 
ble in order to express both relativity and lenition (on the d- of -dam-), it is not 
compulsory for this person (first singular), and class A -m- may be used as well, 
without affecting the underlying syntax: 


(74) indi fodamsegatsa 
theyom.pi.masc=DEICT PV-1sG-afflictsp, prrs=1SG 
‘those who afflict me’ (MI. 33°19; author's trans.) 


(75) Isiress crist nombéoigedar 
COP 3¢¢.prrs=faithyoy Christgpy PV-1sG-vivifysso pres 
‘It is Christ's faith that quickens me.’ (Wb. 19°20) 


Example (75) is a subject-fronting cleft sentence with mandatory leniting rela- 
tive connection, so it would not be useful to classify the verb nom-béoigedar 
‘which enlivens me’ as non-relative merely because neither relativity nor leni- 
tion could be expressed or realised on the surface. Rather, this is a leniting rela- 
tive clause in status, but without the possibility of relative marking or lenition 
because of the choice of the more unmarked class A for the infixed pronoun. 
Furthermore, (76) may serve to illustrate Thurneysen’s rule that within class A, 
only non-third persons are admissible in relatives: 


(76) nitü nodnai(l) acht ishé 
NEG-thou PV-3SGmasc(C) "^ nouriShssc.pres but COP-he 
not ail 


PV-2sG(A)-nourish»ss pres 
‘It is not thou that nourishes it, but it that nourishes thee.’ (Wb. 5°28) 


Again, both parts of this sentence are subject clefts, entailing mandatory lenit- 
ing relatives,” but the relative is marked only by and on the third person singu- 
lar masculine class C pronoun -d-n- in the first verb. This is not formally 
possible with the second person singular class A pronoun -t- in the syntactically 


30 “Altogether distinct from this is the use of a non-relative form in the second of two parallel 
relative clauses, a construction found in many other languages" (GOI § 505 note), referring to 
amal as toisegiu grián indáas laithe , is laithe foilsigedar cech rét sic is toissigiu ‘as the sun is 
prior to the day, and it is the day that makes clear every thing, so. . .is prior. . . ' (MI. 85°11), 
which shows a pairing of two (underlying) relatives, not of two cleft sentences, each of which 
contains a relative. 
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parallel second verb, but not-ail must nonetheless be classified functionally as 
a leniting relative, and not as a non-relative verb. 

Similar considerations apply to the use of class B in relatives.” For this, it 
is instructive to begin by comparing the only other construction in which *a na- 
salizing relative clause can[not] be replaced by a formally independent (i.e. 
principal) clause" (GOI 8 505), namely the figura etymologica connecting a ver- 
bal noun with its own verb via an adverbial nasalising relative (literally *where- 
with’, ‘by which’), as described in GOI (8 499). Against McCone's (1980: 23-24) 
objection, that some examples in Wb. and Sg. show a leniting relative instead," 
Stüber (2010—2012: 235, 240) not only follows Ó hUiginn (1983: 123-124) in dis- 
tinguishing three syntactic types of figura etymologica,? but also demonstrates 
that the perceived exceptions are limited to the first two types, i.e. to object or 
subject-antecedent constructions that allow or indeed require a leniting relative 
also elsewhere. Among the strictly ‘third-type’, i.e. adverbial relative figurae 
etymologicae,* Stüber (2010-2012: 252—254) observes merely two deviant cases: 


(77) frissan ingraim ataroigrainn saul 
to-the,ccsc cur persecution,.. PV-3PL(B)-persecuteyyc.3sc.prer S9aulyoy 
‘as to the persecution wherewith Saul persecuted them’ (MI. 30°2; emended 
reading by Griffith and Stifter 2014: 58-59) 


(78) dond fritobairt mail fritataibret 
from=theparsc.rem OPPOSitioNysr SlOWpar.sc.rem PV-3PL(B)-opposess; pres 
nadorche donsoilsi 


theyom.pr.veur=darknessesyou.p, to-theparse.ren=lightpar 
‘from the slow opposition with which the darkness opposes itself to the 
light’ (Sg. 183°3) 


31 Apart from preverbs ending in a dental such as ad-, where both *aó-Ó- with class C and 
*aó-d- with class B yield at- /ad/ through homorganic delenition; cf. massuthol atomaig, ‘if it 
is desire that drives me’ (Wb. 10°26 (ad-aig)). 

32 See with further examples Ó hUiginn (1983: 123, n.2), and cf. Ó hUiginn (1986: 34). 

33 "The first of these is that in which the antecedent acts as the grammatical subject of a pas- 
sive verb . . . In the second type the antecedent functions as the object of an active transitive 
verb", while in the third type, “the antecedent verbal noun” is taken up by a “following rel. 
clause which already has a subject or object" (Ó hUiginn 1983: 124). The latter “tripartite type” 
(Ó hUiginn 1983: 124) alone necessarily figures the adverbial relative connection under consid- 
eration here. 

34 Termed “intrumentalisch [instrumental]” by Stüber (2010—2012: 231, 245). 
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Stüber (2010-2012: 253) considers “dass in diesen zwei Beispielen offenbar 
keine Relativsátze vorliegen [that these two examples apparently do not involve 
relative clauses]", but concedes the alternative that these are examples of “der 
seltene Fall [the rare case]” of a class B pronoun used in a relative clause, for 
which she adduces the following parallel: 


(79) ishe dano cotammidethar 
COP3sc.pres=he then PV-3PL(B)-powers3gc.pres 
‘It is He then that hath power over them.’ (Ml. 17°2) 


All three of these examples clearly show a class B pronoun in a relative verb, 
contrast the same preverbs with class C, first in the same verb in-greinn as in 
(80), then in fris-gair (81) and finally in con-ocaib (82) — (80) and (82) feature 
nasalising relatives, (81) a leniting one: 


(80) anindagreinnsiu 
when-"^5py-09^33p; (C).persecute»;c prrs=2SG 
‘when Thou dost persecute them’ (MI. 3692) (for aninda manuscript has 
anunda) 


(81) is i BéFind friss doghair 
COP3s6.pres she Bé '""Findyo, PV-3SGren(C) " correspondsss pres 
‘It is [the name] Bé Find that corresponds to her. (Bergin and Best 
1934-1938, 158, 823; author's trans.)?? 


(82) ancondammucbaitisse 
when-"PV-1sc(C)-exaltss; 1156 
‘when they used to exalt me’ (MI. 39711) 


Compare further the following passage from an originally Old Irish text for the 
regular interchange in relatives between third persons of class C in place of 
non-relative class A and unadapted class B: 


35 On fris-dog(h)air, see also Thurneysen (1940: 28), who points out that the combination of 
preverbs in -s and a directly following d- of a class C pronoun in leniting relatives, i.e. as-d- 
and fris-d-, is not found in the glosses and is otherwise very rare. For as-n/ri-d- in nasalising 
relatives, see MI. 31°22, 93°14, 1147, 1247. For further discussion of the fris- with class C pro- 
nouns, see García-Castillero's contribution to this volume. 
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(83) Is é  dorósat na huili 7 
COP3s¢.pres he PV-creategue.ssc.prer theacc.er.neur allacc.pr.neur and 
rodacruthaigestar 7  fodaloing ó nirt 
AUG-3PL(c)-formasc see and PV-3PL(C)-sustaingsg prrs by mighty; 
a chumachtai. Iss é nodaail 7 
his powerge COP3sc.pres he PV-3PL(C)-nourish3sc.prrs and 
cotaói 7 nodafailtigedar 7 
PV-3PL(B)-preserve3sc.prrs and PV-3PL(C)-gladdeng3g¢ pers and 
nodasorchaigedar 7 cotamidethar 7 
PV-3PL(C)-illuminate3c¢ 44, and PV-3PL(B)-rule3c¢ preg and 
dodaraithchiüir 7  atanüigedar na 
PV-3PL(C)-redeeMayc.3sc.prer and PV-3PL(B)reneWso sss theĘcc.pr.neur 
huili, 
all cc.p.neur 
‘He it is who has created all things, and who has formed them and who 
sustains them by the might of his power. He it is who nourishes and pre- 
serves and gladdens and defective marking of relative lenition has re- 
deemed and renews all things.’ (Strachan 1907: 3, 8)" 


Like in (76), this is a (longer) sequence of subject clefts, entailing mandatory 
leniting relatives, but again the lenition can be realised on the surface only 
with the class C pronouns, not with the three cases of class B. That, however, 
does not mean that these three verbs in the chain are not functionally relative 
and could serve to define an exception to the mandatory nature of a leniting 
relative after a subject antecedent. 

For the adjectival cleft under consideration, then, examples (71) to (73), by 
not offering the surface option of marking nasalising relativity because of their 
choice of the more unmarked infixed pronoun classes A and B, respectively, 
cannot therefore, it is true, serve as proof of the mandatory status of a nasalis- 
ing relative in this construction. Neither, however, can they be cited as excep- 
tions to it, since in form they are ambiguous as to whether they are main-clause 
verbs or functionally nasalising relative ones without overt relative marking. 


36 Given here in Strachan’s Old Irish restoration based on two manuscripts, but one manu- 
script copy agrees in all relevant detail concerning the infixed pronouns (Strachan 1907: 2), 
while in the second copy, only two of the pronouns have been corruptly transmitted (rocru- 
thaigestar, donail, see Meyer 1903: 242). 
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5.2 Syntactic raising due to embedding 


(84) hiris innaní as deg 
faithacc thecen.pumasc= DEICT cen. pt, COP3sc.pres.re. best 
rochreitset hicrist 


AUG: believe3p,, లు in=Christpar 
‘faith of those who have best believed in Christ’ (Wb. 31°6) 


On this case, Pedersen (1899: 351) remarks: “rochreitset ist selbstverstándlich 
mit innani zu verbinden, as deg als adverbiale bestimmung zu rochreitset aufzu- 
fassen [Needless to say, rochreitset is to be construed with innani and as deg to 
be taken as qualifying rochreitset adverbially].” And Thurneysen comments: 


An amalgamation of relative constructions . . . is. . . found when a superlative is taken 
out of the relative clause and placed in front of it in periphrasis with a relative form of the 
copula... Here, however, against the rule in § 498, the second relative clause remains a 
leniting one. (GOI § 508) 


and: 


Here it is more probable that innani is felt as the antecedent both of as deg and ro-chreitset 
(GOI 681, n. 126.) 


As for the superlative “taken out of” the relative clause, this positioning of de- 
grees of adjectival comparison is actually the only option to express them in 
adverbial function, as described above (section 4.2). But what is remarkable 
here is the use of a leniting relative in ro chreitset, since the basic sentence with 
a non-relative copula would be expected to be *is deg ro creitset, literally ‘it is 
best how they have believed’, with a nasalising relative. After embedding this 
clause into the context of syntactic dependency from inna n-i ‘of those,’ how- 
ever, ro creitset governed by deg was ‘raised’ to connect directly to the superor- 
dinate inna n-i to yield ‘of those who have believed’, with the normal leniting 
relative after a subject antecedent. What makes this example particularly 
valuable is the retained singular as after the plural inna n-i, thereby showing 
the “amalgamation” referred to in GOI - rather than a full adaptation to *ata 
deg ro chreitset, literally ‘who are best who have believed’ (cf. [89] below). A 
syntactically less complex case also adduced in GOI (§ 508) is: 


37 Thurneysen’s “remains” (GOI § 508) is accordingly to be understood as denoting the trans- 
ferral of the leniting relative from the now syntactically parallel as before it. 
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(85) asmaam roSechestar arsidetaid 
COPssc pues. 7 MOst AUG-"""followss;»sg; antiquityacc 
*who has most followed antiquity' (Sg. 208^15) 


This example again adapts non-relative *is maam ro sechestar. Furthermore, 
Elliott Lash has suggested to me that what was listed above as (38), assuming 
an orthographically ambiguous nasalising relative, could instead be under- 
stood as raised, with defective marking of relative lenition, which is why it is 
here restated with new numbering: 


(86) doadbadar híc bríg inna persine 
PV-shoW3s¢.pres.pass here mightyoy thegen.sc.rem Dersonog, 
dodiccfa asmó de 
PV-3SGyrur'COMe3sc.rur COP 3sc.pres.rex=MOre Of3gc.neur 
foc[h?]íaltar 


PV-(LEN)expectssc purs pass 
*Híc is shown the might of the Person that will so come, who is the more 
expected.' (Wb. 29*4) 


Compare finally the Modern Irish reinterpretation of this pattern, described as 
*double relative construction" by Ó Nolan (1920: 114-116): 


(87) Is é Pól an taspal is mo a dfhág scríofa. 
COPpres he Paul the apostle COP, more C leave, writings 
‘Paul is the apostle who has left the most writings.’ (lit. ‘Paul is the apos- 
tle who is biggest/most who has left writings’) (Ó hAnluain 1999: 113-114, 
811.36 [1960: 125—126, 8225]; author's trans.) 


In summary, these examples do not establish exceptions to the standard con- 
struction of adjectival clefts, because the expected nasalising relative would at 
any rate have been superseded by a leniting one only secondarily as the result 
of syntactic ‘raising’. 


5.3 Mixed antecedents 


A construction with mixed antecedents was already seen in (14) above, to be 
read as ba in[d] fortgidiu 7 ba hi temul du-gnith saul . . ., ‘it was covertly and it 
was in darkness that Saul ... used to make ...’ (Ml. 30?3), where the non- 
relative verb du-gníth follows the construction required by the surface-antecedent 
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closer to it, hi temul, i.e. the adverbial cleft without a relative main verb, while 
the first surface-antecedent was adapted to this construction by being turned 
into a fronted adverb that is not otherwise admissible. A different case of mixed 
antecedents may be found in: 


(88) cid dian 7 cian nothéisinn 
although-COPss; »srsus SWiftyom.sc.neur And faryomsc.neur రై ర్యంంయ51.5089) 
‘though I went fast and far’ (MI. 419) 


According to GOI (§ 506), the leniting relative here is one of the rare innovative 
deviations from the nasalising relative rule, which will be addressed in section 
6.2. However, while the first antecedent is the straightforward equivalent of an 
adverb in other languages - entailing an adjectival cleft *cid dian no-téisinn, 
‘though it was fast how I went’ - this is not the case with cian: while one can 
go swiftly/in a swift fashion, one does not go in a far fashion, but rather a long 
distance. This semantic difference necessitates understanding cian as a sub- 
stantivised?? adjective, with an ensuing object relative, and for the latter, a le- 
niting relative is one of the two options, i.e. *cid cian no théisinn, ‘though it was 
a long distance that I went’. Here, too, then, the relative construction agrees 
with the second, closer, antecedent.?? 


5.4 Failure to mark relativity in a preverb 


(89) iscián arfolmas dün 
COP3sc.pres=l0N8yom.sc.neur PV-undertakess; prer.pass fOfip; 
insin 


Theyom.se.neur=thatyom 
‘It is long since that has been destined (has been imminent) for తి 
(Wb. 21?2; author's trans.) 


38 This is not the same as cian in its common use as a temporal noun ‘long time’ (cf. section 
4.1 above, and also [89] below), inflected as an d-stem rather than with neuter o-infection 
more typical of substantivised adjectives. And while one does not go in a far fashion, one can 
of course go in an extended fashion, i.e. for a long time, which allows the option of an adjecti- 
val antecedent in section 4.1 above. 

39 When this paper was delivered at the original conference, Ruairí Ó hUiginn raised the valid 
objection that there are semantic limits to the pairing of mixed antecedents, so that disparate 
combinations he exemplified by ‘It was quickly and dinner I was eating’ are unlikely to occur. 
The same semantic discrepancy, however, is not found in the present case, nor is it in (14) above. 
40 Already listed as (23) above (see there for further details). 
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According to GOI (8 493.4), “the pretonic prepositions im(m)- and ar: have disyl- 
labic forms in relative clauses: imme: or imm., ara: (arch. are-)". From this, it 
would have to be inferred that ar-folmas in (90) is non-relative. However, while a 
secondarily shortened (syncopated) form ar- is limited to position before proclitic 
ro, relative ar- is occasionally encountered elsewhere, including at least once in 
Wb.,"' compare the following complete collection: 


(90) a. ara: ara-legthar (993), ara-thá (103), ara-foim (13°24), 
ara-bágim-se (1679), ara-clessid (22°18), ara-neut- 
sa (23°27), ara-mbere (28°11), ara-nethem (31°17) 


b. arru.: arru-dérgestar (4°13), arro-dibaid (11°19) 
c. arar- (other than -ro-): ara-rethi (6°22) 
d. ara-ro-: ara-rograd (3°25), ara-róit (4^19), ara-rogartsom 


(5723), ara-rogartsom (5°23), ara-róitmar (9°10), 
ara-róit (610), ara-roéit (24?32) 


e. ar-ro-: amal ar-rograd (9°13)—contrast 3°25 in (91d). 
f. ar: Ished ar-tha 
for COP3s¢ prrs=it PV- P remains pres 
inso 


the=thisnom 
‘It is this that remains.’ (30°13)—contrast ished insó ara-thá (10°3). 


While (90e) is merely more likely to be relative than non-relative,^ (90f) is unam- 
biguously relative judging by both its context and the overtly marked relative len- 
ition and shows - unless dismissed as a copying error - the incipient spread of 
the more common form ar- at the expense of relative ara-. Accordingly, ar-folmas 
in (89) could be taken as an innovative variant (cf. Ó hUiginn 1986: 65) for overtly 
relative ara-folmas. 


41 See GOI (8 493.4, note) and, for MI. in particular, Strachan (1903a: 68) concerning cases of 
ar- before ro; his collection does not differentiate between stressed and proclitic ro/ru, and fur- 
thermore Ó hUiginn (1986: 65-66). 

42 Compare the findings summarised by Ó hUiginn (1986: 56), according to which "where the 
verb which follows ama(i)l is not in the past subjunctive mood, relative marking is normal", ac- 
counting for seventy-seven cases in the glosses, as against fifteen classified as non-relative — six 
of the latter, however, merely show a non-third person class A infixed pronoun (Wb. 2?11, 14^13, 
1682, 17°10, 27°19, MI. 53°18 [if emended correctly]), a feature that is here rather taken as incon- 
clusive as to relativity (see section 5.1), and only 1672 clearly attests to non-relative status by 
using the independent negative (amal ninfessed). 
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6 Exceptions to the nasalising relative 
construction 


In remains to consider a small number of examples that apparently feature an 
unambiguously non-nasalising or even non-relative verb. Some or all of these 
were already noted by Pedersen (1899: 391, 413, 414), Mac Coisdealbha (1998: 
155; cf. Isaac’s comment in the same book, p. 257) and ర hUiginn (1986: 48-58), 
without, however, distinguishing them from cases with non-third person class 
A infixed pronouns like (71) and (72) (on which see section 5.1), and Wb. 608 
adduced as a counterexample by Mac Coisdealbha“? instead contains an object 
antecedent (see [12] above).^^ This leaves five cases: (91), (92), (100) to (102).*° 


6.1 Third-person class A infixed pronoun 


(91) corrup leir roscomallathar^? 
so.that-COPaue.3sc.pres.susy diligentyou ss sor AUG-3PL(A)-fulfilsss sors sun; 
intí ardatüaissi 


theyom.sc.masc=DEICTyom.se PV-3PL(C)-hearSssc pris 
that he who hears them may fulfil them diligently’ (MI. 129°2) 


In line with Thurneysen’s (GOI § 413) ruling that in relative clauses, Class C 
“regularly replaces the pronouns of class A in the third person only” (for other 
persons, compare section 5.1), -s- here appears to point to a non-relative verb. 
On the other hand, Thes. 1: 724 (addendum 440) adds: “for the irregular s in 
roscomallathar cf. Wb. 9° 11, BCr. 10” 10” - recte: 


43 Isaac (in Mac Coisdealbha 1998: 257) accidentally quotes 6°9 instead. 

44 Another counterexample only tentatively invoked by Pedersen (1899: 391, 413) is to be in- 
terpreted differently: cid beicc daucbaidsi, ‘though it be of little worth, ye will understand it,’ 
(Wb. 21°12) (cf. Thes. 1: 634, n. e). 

45 That is, still more than allowed for by Sims-Williams (1984: 193): “counter-examples are 
dubious (the best is Wb 13°29).” 

46 On the use of ro, Thes. 1: 440, n. c. notes, “In sentences of this type I have no other instan- 
ces in the Glosses of ro- with the second verb ... Probably noscomallathar should be 
restored." 
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(92) ishé cruth — inso ém  nosmessammar 
COP35¢.prrs=he WaYyom the=thisyom truly PV-3PL(A)-judgeis, pres 
‘This, truly, is the way we shall judge them.’ (Wb. 9°10) (on nosmessam- 
mar, Thes. 1: 553, n. c notes, “leg. nommessamar?”) 


(93) andusléicet inna rind 
when-"^?PV-3Pr(A):sinkss; psg theyom.pr.nevr planetSyow. pi. 
*when the planets sink' (Thes. 2: 11 [Carlsruhe Bede 18^10]) 


The crucial question in order to assess (91) is whether the verbs in (92) and 
(93) are unambiguously relative. In Ó hUiginn's (1986: 56-58) assessment, 
*(In) C(h)ruth" with a relative verb is found six times in Wb., three times in MI. 
and five times in Sg., as against six non-relative cases in Wb. (including 92 
above), but none in MI. and Sg. However, while the present case features cruth in 
the nominative, after which an adverbial nasalising relative connection (‘by 
which") should at least be an option, the other five non-relative cases all show 
the dative in chruth-sin ‘in that way’ (Wb. 3°27, 18°16, 24°17, 24^13) or in chrud-so 
‘in this way’ (Wb. 31°11) fronted in a cleft sentence, i.e. with an adverbial surface 
antecedent, after which a non-relative verb is the norm anyway in Old Irish. 
While the difference between cruth and in chruth is duly pointed out by O 
hUiginn (1986: 56-57, cf. 65, 66), he includes the five examples with adverbial in 
chruth because the nasalising relative occurs once: 


(94) hore isinchruthso rumboi 
because COP35¢.pres-thepar.sc.masc=WAYpar=PROX AUG-"bes,c parer 
dossom 


tO3s¢.masc=35Gmasc 


‘because it is thus that he has been’ (Wb. 33°1) 


However, the latter is a special case due to syntactic raising (cf. section 5.2, as 
well as possibly [108] below): while the pattern of an adverbial cleft regularly de- 
mands a non-relative verb, i.e. *is in chruth-so ru boi do-ssom, the entire construc- 
tion is here embedded into an hóre sentence, and the second part is then raised 
to depend directly on hore, using the option of a nasalising relative (cf. e.g. hüare 
romboi, ‘because it was,’ [M]. 468). 

As for a" ‘when,’ ర hUiginn (1986: 46-47) counts five cases with a relative 
verb from Wb., one hundred and twelve from MI. and fourteen from Sg., as against 
seven non-relative ones from MI. and one from Sg. Of these, however, five merely 
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feature a class A (non-third person) or B infixed pronoun," for which see section 
5.1 above, and two could instead show phonological loss of an interconsonantal 
nasal. ^? This leaves only one assured exception, 


(95) a. arrobu lintae 
when-AUG-COP;.; rr filledio, sc. 
‘when it was filled’ (MI. 25°16), 
contrast e.g.: 
b. arrombu suidigthe 
when-AUG-"?COP,s. 44; placed soy sc. 
‘when it was placed’ (MI. 4846) 


It is to be noted, however, that the verb in (95) is the copula. Ó hUiginn (1986: 66) 
has not only shown that among all patterns that at least allow an adverbial nasal- 
ising relative, *by far the greater part of the clauses which use [parataxis] contain 
the copula" but also argues that "the use of the nas. rel. seems to have progressed 
much more slowly in clauses containing the copula than it did in other clauses . . . 
[thereby] preserving the older parataxis for longer". Exceptions featuring the cop- 
ula thus may represent one specific systematic deviation from the use of a nasal- 
ising relative, as allowed for in GOI (8 505), rather than innovations as part of the 
beginning demise of this construction as documented in GOI (8 506). 

Thus, after a", the regular construction with stressed verbs is indeed the na- 
salising relative. As for cruth, and discounting instances of in chruth in adverbial 
clefts, all other cases listed by Ó hUiginn (1986: 56-57) are relative, ^? establishing 
the nasalising relative as the mandatory construction.?? Examples (92) and (93), 
therefore, are valid parallels for the rare, or incipient, use of the third-person 
class A pronoun -s- in a relative verb, and for (92) in particular, the application of 


47 anatammresa (MI. 31°14), andumsennat (39°28), annumfindbad(a)igtisse (39914), animmun- 
timchella (108?9), anaramréet (131^8). 

48 an as[n?]glinn (MI. 70°12), anas[m?]berat (Sg. 40°15); see the discussion in last paragraph 
of the introduction and especially fn. 6. 

49 One of Ó hUiginn's (1986: 53) cases, cruth ropridchissem ‘how we have preached’ (Wb. 
24°17), is not overtly marked as relative, but formally compatible. The antecedents in the re- 
maining cases are either cruth or dative (in) chruth, but the latter is not part of a cleft construc- 
tion, cf. in chruth nandrann insce, ‘as ... is not a part of speech’ (Sg. 22157); ciachruth 
nombiad ‘how could He be’ (MI. 17°26). 

50 The same is to be observed, without exception, for other nominal antecedents in manner 
clauses, see ర hUiginn (1986: 56-58). His one reported deviation (6 hUiginn 1986: 57, see also 50) 
is in fact also relative: inmét beta firién in dóini ‘in proportion as men are righteous’ (MI. 56°20). 
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the double-article rule in ishé cruth . . . can be taken as additional evidence for a 
close, relative connection of the following verb.” Accordingly, while corrup leir 
roscomallathar in (91) could be counted as a rare exception in deviating from the 
requirement of a (nasalising) relative in an adjectival cleft, it could also be taken 
as a similarly rare case of the third-person (-s- in particular)? class A pronoun in 
a functionally (though formally unmarked) nasalising relative verb. Additional 
support for the second interpretation may be seen in the fact that (91) (= MI. 
129"2) is paired with 129^1 — with these two glosses glossing two parallel Latin 
phrases — and the latter shows an overt nasalising relative: coru[p]léir dungné 
nech inpreceupt (Ml. 129^1 = [57]) above. A second isolated example featuring a 
third-person class A pronoun is: 


(96) is denithir sin arachrin cumachtae 
COP 35¢.pres SWifti; thatace PV-3SGyeur(A)-perishage.pres pOWeruoy 
innapecthach 


ther pi asc" SIDDerSor pr 
‘Even so swiftly does the power of sinners perish.’ (MI. 57°12; also listed as 
an exception by Ó hUiginn 1986: 58, n. 35) 


This is to be contrasted with the explicitly relative class C pronoun e.g. in: 
(97) amal arindchrin dé 

as — PV-3sGya4(C)-"""perishgs; pres smokeyoy 

‘as smoke perishes’ (Ml. 57°10; cf. Wb. 2721 32°10, MI. 854510). 


The apparent counter-examples are the following: 
(98) fobithin arachiurat 


because PV-3SGyrur(A) "" perishs; rur 
‘because they will perish’ (MI. 59°9) 


51 See Uhlich (2013) (pace GOI 8 471). 

52 In this connection it is significant that, however rare -s- is in relative clauses in Old Irish, 
this very use must have formed the basis for -s- developing into a mere relative marker in 
Middle Irish, for which see Strachan (1904: 169—170). And if accordingly, -s- is viewed as a 
low-register colloquialism (of the kind described by McCone 1985), it is paralleled by what ap- 
pears to be an early case of hypercorrect use of ro for no (for this more widespread feature in 
Middle Irish, see Breatnach 1994a: 88 11.4-11.5; McCone 1997: 189-190, 197). 
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(99) intan aracrínat 
when PV-3SGygyi(A)- ^ perishsp, pres 
‘when [they] perish’ (MI. 7372; GOI 8 423) 


) LEN. 


They differ from (96) in that they are readily explicable by the general exception 
described in GOI (8 505).?? As for (96), this gloss also contains the same con- 
struction with a clear relative construction in amal as ndian ade ; as ngair mbis 
‘as it is swift and lasts for a short time’, MI. 57°12 (= 107 below), even though in 
the latter, mbis could also have been created by syntactic raising to be con- 
strued directly with amal (see section 5.2, and example 94). At any rate, a spe- 
cial explanation is required for (96), and a plausible factor to have caused the 
use of class A -a- rather than class C -(n-)d- may be seen in the fact that ara-ch- 
rin is one of a handful of Old Irish verbs that feature a petrified infixed pro- 
noun, whose meaning and function ceased to be understood synchronically, so 
that the basic lexeme could be viewed, and generalised, as ara-chrin rather 
than *ar-crin.** 


6.2 Absence of feasible relative nasalisation 
The considerations presented in section 6.1 leave the following three cases: 


(100) badféal et  badfedte 
COP 3s¢.mpv=faithfulyousc.veur and COP 35¢.impv=honorablyyom.se.neur 
dogneid cachréit 
PV-dOopr pres CVETYacc.sc.masc=thing acc 
‘Let it be faithfully and let it be honorably [?] that ye do everything.’ 
(Wb. 13°29) 


(101) cid toisigiu doberthar 
although-COP35¢.pres.suzy fitStcomp | PV*dOssc purs. sugj.pass 


53 While relative hore arinchrinat ‘because they decay’ (Wb. 27°1) is of course also possible. 
54 See GOI (8 423). An alternative explanation is implied by Thurnysen's description (GOI 
8 423) of intan aracrinat in (99) as simply *without d', i.e. as if ara- were here the relative form of 
the simple preverb without any pronoun attached. However, this is contradicted by the lenition 
spelled out in (98), which points not to a nasalising relative as an option after such conjunc- 
tions, but to the presence of a neuter pronoun. Thus, leg. ara-c[h]rinat with standard defective 
spelling. 
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indfochaid 

theyom.sc.rem=ttibulationyoy 

‘though the tribulation is inflicted first’ (Ml. 19°11) (see Thes. 1: 716 [ad- 
dendum 26.24]) 


(102) airismou ruicim les mairchissechtae indaas 
for-COP3c¢.prrs=more PV-reachycg pers need, MYy=pitying,,, ‘than’ 
‘for I need to be pitied rather than . . .' (MI. 22°14) (cf. also section 4.2; for 
airismoóu manuscript has airimmou) 


In all three examples, “the opportunity to nasalize was not taken" (Isaac, in Mac 
Coisdealbha 1998: 257, commenting on [100]). Thus, these adjectival cleft senten- 
ces clearly do not use a nasalising relative, but the question is, what do they use 
instead? If Thurneysen's observation were applied, according to which “a nasaliz- 
ing relative clause can be replaced by a formally independent (i.e. principal) clause 
in almost every instance" (GOI 8 505), all three main verbs above would have to be 
taken as non-relative. As Ó hUiginn (1986: 66) has shown, this type of systematic 
exception is predominantly found with the copula. Furthermore, since none of 
these forms can be proven by their orthography to be non-relative,^ the alternative 
is to take them as leniting relatives. Rather than systematic exceptions, then, these 
may be isolated examples of the incipient “extension of the leniting at the expense 
of the nasalizing relative" that will be completed by the end of the Middle Irish 
period (Ó hUiginn 1986: 70; see also 69-75 for more details; also GOI 8 506). 


7 Conclusions 


Having thus reviewed almost the entire evidence for the construction of adjecti- 
val cleft sentences — apart from one special environment to be addressed in the 
appendix - it emerges (section 4) that the vast majority of examples either 
shows overt marking of relative nasalisation, or the spelling is at least compati- 
ble with this construction — be it ambiguous merely orthographically?^? or also 
phonologically."" Section 5 addresses a number of formal deviations that are 


55 Contrast, for instance, the unambiguous difference, with a simple verb, between relative 
hore pridchas ‘because he preaches’ (Wb. 7°15) (vs. non-relative pridchaid) and non-relative 
hore pridchim ‘because I preach’ (Wb. 5*6) (vs. relative no pridchim). 

56 Such as adcotar in (39), where the «c» can represent non-relative /k/ or nasalised /g/. 

57 Such as dorigni in (46), where the /r/ is not capable of being nasalised. 
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argued to be due to additional factors (syntactic raising, mixed antecedents), for- 
mal innovation (relative ar- replacing ara-) and, to begin with, the use of non- 
class C infixed pronouns. These are formally incapable of marking relativity,?? 
but are regularly admissible (except for class A third persons) nonetheless in in- 
disputably relative contexts like the mandatory leniting relative after a subject 
antecedent. Therefore, their presence alone cannot serve to prove that the verbal 
form that contains them is functionally non-relative. In other words, while e.g. 
third person singular beires ‘who carries’ is exclusively relative vs. beirid ‘carries’ 
exclusively non-relative, the same distribution is valid for, say, nasalising rela- 
tive do-mbeir ‘by which ... brings’ vs. non-relative do-beir (with /b/) ‘brings’, 
but not for the pair dondom-beir ‘by which ... . brings me’ vs. dom-beir, since 
the latter is found for both ‘brings me’ and ‘by which ... brings me’. Relative 
forms like dom-beir may still, it is true, be classified as a type of exception, if 
viewed from the diachronic perspective described by O hUiginn (1986: 67): 


It has been held that the creation of the class C inf. prons. represents a relatively late de- 
velopment in the prehistory of Irish and grew out of a need to formally distinguish rel. 
clauses . . . Pronouns of the first and second persons seem to have been much slower in 
adopting the new rel. forms... 


In this scenario, the exception consists in not creating the opportunity for the 
formal marking of underlying relativity by introducing a class C pronoun. 
Relativity unmarked, however, does not equal non-relativity,?? and synchroni- 
cally, such forms will still not fall under the rule that “a nasalizing relative 
clause can be replaced by a formally independent (i.e. principal) clause” (GOI 
8505), as is the case e.g. for Thurneysen’s example hóre ni-ro-imdibed with its 
unambiguously non-relative negative ni. Therefore, the use of non-class C pro- 
nouns (apart from third persons class A) is here treated as formally inconclu- 
sive as to relativity and thus does not constitute an assured exception to the 
use of a nasalising relative in adjectival clefts. 

This left only five possible counterexamples (section 6), but even these 
were argued to be explicable not as systematic exceptions, but as individual in- 
novations affecting all types of nasalising relatives. The result of this study, 
then, is that Thurneysen's ruling that adjectival clefts trigger a mandatory (un- 
derlying) nasalising relative is to be upheld in principle, because there is no 
unambiguous evidence that a main-clause verb could be used instead. 


58 Relative lenition and nasalisation can only be expressed, and only nasalisation be shown 
in writing, on the -d- of a class C pronoun, cf. (26), (27), (32), (50), (54). 

59 As it does not, either, with the phonologically ambiguous forms listed above, such as dor- 
igni in (46). 
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Appendix: The main verb attá ‘to be’ 


A specific application of the adjectival cleft pattern remains to be considered, 
which is described by Thurneysen in a separate paragraph, namely “when the 
antecedent supplies the concept that constitutes the predicative nominative of 
the relative clause” (GOI §500). One of his examples involves an adjective, as 
in (105) below, resulting in an adjectival cleft as discussed here, with the only 
difference that the main verb is ‘to be’, specifically the substantive verb — liter- 
ally ‘it is X how Y is’, etc.°° Still, two subtypes are to be distinguished: 


(103) asmenic mbis confitebor duatlugud 
COP 356. pres.re.=OfteNom.sc.neut Desa ARREL confitebor for=givingpar 
bude: ciasu gnathiu dofoisitin - - sed 
thanks; although-COP35¢.pres COMMON ,oyp for-confessionp,;... sed 
non est hic 
non est hic 


‘that confitebor is often for returning thanks, though it is more common 
for confession, sed non est hic’ (Ml. 26°4) 


While all cases feature the substantive verb, this first example would do so 
even outside the cleft pattern. While structurally required by the prepositional 
predicate du atlugud, this is not immediately suggested by the copula continua- 
tion ciasu gnathiu dofoisitin, but confirmed by the similar reference in: 


(104) biid didiu a confessio hisin 
bessc.uan then theyom.sc.neur Confessionyo, DEICT_that 
dofóisitin pecthae  biid dano domolad 
for=acknowledging,,; SiNScrnp De3sc.uan also for=praisingpar 
biid dano do atlugud buide do foisitin didiu 
be3sc¢.nan also for givingy,, thanks, for confession,,; then 
atàsom sunt 


be3sc.pres=35Gneur here 
*that confessio, then, is wont to be to acknowledge sins, it is also to praise, 
it is also to offer thanks: here, then, it is to confess.' (Tur. 58?) 


60 To the cases listed below might be added those in section 4.1.1 involving cían, if to be 
taken adjectivally. 
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This means that ciasu gnáthiu dofoisitin in (103) must itself be an elliptical cleft 
sentence, for ciasu gnathiu mbís . . . In the next three cases, however, the sur- 
face antecedent itself stands for the predicate of the following verb (as opposed 
to menic above with adverbial connection), which therefore must be used in 
place of the copula:?! 


(105) cid drüailnide mbes 
although-COP 356 pres.suzy COMTUPtyom.sc.neur SHO 35G.PRES.SUB) 
chechtar indarann 
"P'eachuow screw th@gen.pu.rem=tWOcen.pu.rem=Pats crn.pu. 
isinchomsuidigthiu 
in=thepqr.s¢.neur=COMPOUN par s¢.neur 
‘though each of the two parts in the compound be corrupt’ (Sg. 202°3) 


(106) isfaittech rondboisom 
COP 35¢.pres=Careftllyou.sc.neur AUG-"“°3SGyrur*be3sc.prer=3SGmasc 
nant neque manebunt asrubart 


NEG;u"COPs s. neque manebunt PV-sayauc.ssc.PRET 
‘He was careful that he did not say neque manebunt.’ (MI. 2154) 


(107) amal as ndian ade 7 as 
as COP3sc.pres.reL ^ sWiftnom.sc.neur ANAPH and COP3sc.pres.reL 
ngair mbis 


NAS, NAS 
shortyom.sc.neur be3sc.pRes.REL 


‘as it is swift and lasts for a short time’ (Ml. 57°12) 


In the absence of evidence that the copula itself could also be used as the main 
verb in this construction, the following case from the Additamenta in the Book 
of Armagh seems puzzling: 


(108) fer... nadip rubecc 
man NEGgyg-COP3¢¢.pres.susy tOO.littlenom.sc.neur 
nadipromar bedasommae 


NEG gyp-COP 35c.pres.suny=t00.greatyomsc.neur COP3sc.psr.sug; -his-wealthyoy 


61 For this and other regular uses of attá in place of the copula, see GOI (8774) and Mac 
Coisdealbha (1998: 154—155). 
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‘a man ... whose wealth would not be overlittle or overgreat’ (Thes. 2: 
241.8-9 and 27-29) (cf. Bieler 1979: 176-177, 813 [2];° note also that the 
example is glossed according to the traditional interpretation.) 


Here, nadip clearly contains the third person singular present subjunctive of 
the copula, but what is bed(a)? The sequence bedasomme (folio 18°15) is re- 
produced as such in Thes., but edited as bed a sommae by both Bieler (1979: 
177) and Thurneysen (1949: 33), in which bed could be the third person singular 
past subjunctive (or conditional) relative of the copula. On the other hand, 
since all parallel instances above feature the substantive verb, one could in- 
stead assume a third person singular past subjunctive of attá, standing for no 
beth/bed with omission of no.°? However, this would still leave a problem of 
concord, according to which the present subjunctive in nadip would be ex- 
pected to be matched by the main verb of the cleft construction.“ A solution 
may be proposed that will begin by considering the word sommae. This is given 
in DIL (s.v. 1 sommae, ‘riches, wealth’) as an id-stem connected with soim ‘rich, 
wealthy,’ implying a standard abstract formation. Thurneysen, on the other 
hand, lists it as ‘Subst. neut.’ (1949: 103). Among the attestations in DIL for 
both sommae and its counterpart 2 dommae ‘poverty, scarcity,’ there is no evi- 
dence to ascertain the gender of either in Old Irish.9? Moreover, besides the ad- 
jectives soim and doim, there are also 2 sommae ‘rich, wealthy’ and 1 dommae 
*poor, needy,' and on the other hand there are also sommatu *wealth, luxury 
and dommatu ‘poverty, want.’ This makes a derivational relationship soim > 
abstract noun sommae and doim > dommae far less obvious, and if an original 
semantic difference between abstract sommatu ‘the status of being rich’ and 
concrete sommae ‘riches’, etc., can be assumed, sommae and dommae are neu- 
ter nouns subtantivised directly from the homonymous adjectives. In this case, 
somma in (108) could be singular or plural, and for the use of the latter — in- 
cluding a concrete meaning — compare marba sommai ‘goods will be destroyed’ 
(Meyer 1894: 40.9.13). This in turn opens up the option of taking the verb in 
bedasomme as plural, too, and I suggest the following derivation that solves 


62 This phrase has no equivalent in the corresponding passage in the Vita Tripartita, see 
Stokes (1887: 188.26—27), Mulchrone (1939: lines 2221-2212). 

63 Cf. combed hed nobed and, ‘so that that should be there’ (Wb. 3°10). For the omission of no 
see Kelly (1999). Mac Coisdealbha (1998: 154) lists this as one of three cases that show “the 
substantive verb in place of the copula” but does not explain the form. 

64 Apart from cases where the introductory copula is reduced to “the unmarked, neutral pres- 
ent tense" (see Mac Coisdealbha 1998: 144-145). 

65 Dinneen's (1927) “soime. . . f., riches" could of course continue either gender. 
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both problems, that of the verb ‘to be’ and that of concord: assuming that 
bedasomme is among the numerous cases in the Additamenta that reflect an 
Early Old Irish spelling, its original shape may have been *bede e somme, in- 
volving a third person plural present subjunctive relative of atta, which in 
(Classical) Old Irish is spelled bete.96 This was misunderstood by a subsequent, 
Old Irish scribe, who, instead of modernising it correctly to *bete a somme, 
adapted it mechanically to bed a somme. And returning to the construction 
under discussion, a separate ‘it should not be too little nor too great that his 
riches are’ would be expected to be construed with nasalising relative, i.e. *nip 
rubecc nip romar (m)bete a sommae. That the b- is not nasalised may be due to 
one of two reasons: either, this text is simply too early for relative marking to 
have been analogically transferred to the initial of simple relative forms. Or, 
as Elliott Lash has suggested to me, this may be yet another case of syntactic 
raising (cf. section 5.2 and [94]), in which expected (m)bete was raised to the 
level of the preceding genitival (leniting) relative nadip . . ., as if depending di- 
rectly on the superordinate antecedent fer. Be that as it may, the resulting 
merged syntagm, which subordinates the adjectival cleft construction ‘it should 
not be too little or too great that his riches are’ to the genitival antecedent ‘a 
man (whose)’, cannot be represented literally in English. As possible approxi- 
mations, however, the following may be suggested, namely either (a) ‘a man 
concerning whom it should not be too little or too great that his riches are (or: 
. . . too great whose riches are)’, thereby compromising on the genitival relative 
connection, or (b) ‘a man whose riches should be such that they are not too 
little or too great’, with a freer rendering of the clefting construction. 


66 Cf. Wb. 10°22. For the earlier spelling convention of mediae for non-initial voiced stops, cf. 
scarde, ‘who separate’ (Thes. 2: 247.18 and 39 [Cambrai Homily]), adobragart (Wb. 19°5 [prima 
manus]), ‘has seduced yov’ (cf. GOI 831, note). 

67 Cf. GOI (8 495 [b]) for lenition, (8 504 [c]) for nasalisation. 
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9 The “Cowgill particle", preverbal ceta 
‘first’, and prepositional cleft sentences 
in the Old Irish glosses 


1 Outline of the problem 
1.1 The “Cowgill particle” 


The origin of the absolute / conjunct distinction in the Insular Celtic languages 
has generated more literature than any other phenomenon in the history of the 
languages. The theory currently enjoying the greatest regard, generally dubbed 
the “particle theory”, asserts that a second-position clitic particle, the “Cowgill 
particle”, was inserted in most sentences and is ultimately responsible for the 
variety of verbal endings, as well as numerous other phenomena, such as muta- 
tions or the lack thereof.’ In the spirit of the particle theory, one can offer the 
following simplified derivations for the absolute / conjunct distinction (where E 
stands for the enclitic Cowgill particle): 


Insular Celtic Pre-Irish? Primitive Irish Archaic Irish Old Irish 
*bereti-E *bereti-E *b'er'e0'i *b'er'e0' beirid ‘he carries’ 
*ni-E bereti *ni-E beret AT *n'Tb’er’ ni:beir ‘he does 
=> => b'er'eü = - not carry” 
*to-E bereti *to-E beret *to b’er’eO *to b'er' do:beir ‘he gives’ 
*ni-E tobereti *ni-E *ny *nu ni-tabair ‘he does 
toberet tofer e0 tofer' not give’ 


Figure 1: Simplified derivations for the absolute / conjunct distinction in Old Irish. 
ĉl am using Pre-Irish in a loose sense to indicate some time after the early /-apocope 
(McCone 1996: 100-102) but before other distinctly Irish sound changes. 


While the problem is fiendishly complicated, the above derivations capture the 
essential facts of the particle theory, which has two basic elements: the involve- 
ment of a second-position clitic and an early i-apocope. Both elements have been 


1 The standard literature arguing for a single particle is Boling (1972), Cowgill (1975), Schrijver 
(1994), and Schumacher (2004). 


3 Open Access. © 2020 Aaron Griffith, published by De Gruyter. This work is licensed under 
the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. 
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accepted by some and rejected by others, but most authorities accept some version 
of both as having a role in the rise of the distinction.” The generally accepted form 
of the enclitic is now *eti (Schrijver 1994; Schumacher 2004: 90-114), the form of 
which has been determined based on direct British Celtic evidence, indirect Irish 
evidence, and etymological considerations. 

One of the challenges of the Irish evidence is that the particle *eti is itself totally 
lost. The only clue to its original existence is the presence of additional morphologi- 
cal material on the verb in the absolute endings (absolute beirid < *bereti-[e]ti vs. 
conjunct -beir < *bereti) and the lack of otherwise expected lenition (main clause 
do-cing ‘strides’ « *to-[e]ti-kingeti vs. relative do-ching ‘who strides’ < *to-io-kingeti). 
Given this lack of direct evidence, any additional evidence for the presence of the 
Cowgill particle would be quite welcome. As it happens, there is a small group of 
forms, hitherto relatively unnoticed, which may offer support for the particle. 


1.2 The preverb ceta ‘first’ 


A somewhat uncommon preverb ceta ‘first’ appears in the glosses. It is connected 
etymologically to Gaulish *Cintu in Cintu-gnatus ‘first-born’, Middle Welsh cynt 
‘earlier’, and Old Irish cét- ‘first, early’ (first member of compound, usually with 
nouns) and is reconstructed as *kentu ‘first’? (Matasovic 2009: 201, LEIA C-103; 
Evans 1967: 182; Zair 2012: 174). The Old Irish use of interest here is as a preverb: 


(1) is hé céetne fer ceta-ru-chreti di 
COP3s¢.prrs 3SGwasc fitStyom.sc.masc MANyom PV-AUG-believesse peer Of 
ais assiz hi Crist 


folkpar Asia, in Christ,cc 
‘He is the first man of the folk of Asia that had first believed in Christ.’ 
(Wb. 7°11) 


2 For instance, Kortlandt (1979) accepts the particle but rejects i-apocope, while McCone 
(1979) accepts the i-apocope (see also McCone 1978) but rejects the particle. Other theories on 
the origin of the absolute / conjunct distinction (e.g. Sims-Williams 1984, McCone 2006, and 
Isaac 2007) accept both elements, but operate with a series of second-position clitics rather 
than with a single particle. That the main-clause negation ni is itself made up of two particles 
“ne + “est is not relevant to the discussion of the Cowgill particle, since the development “ne 
est > “nest > *nist must be quite old, given the raising of *é to “i. 
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As noted by Thurneysen (GOI 8 384, 8 393) the preverb is never accented, always 
appearing immediately before the accent and after all other conjunct particles 
and preverbs (see also García-Castillero 2014): 


(2  fris-cita-comrici diib 
against.REL-PV-encountersg pres Of3p, 
‘which of them you first encounter’? (Thes. 2: 23.38) 


The preverb is most frequently attached to relative verbs, and it has various 
forms: ceta / cita / ciata, as well as variants with final -o or -u. As there are few 
attestations, they can be listed here in toto for the glosses, as found in Table 1. 
Since only Wb. and MI. have more than one example, they will be the focus here. 

To be kept separate from ceta ‘first’ is the preverb ceta found in ceta-bi 
‘feels, perceives’, which is cognate with Middle Welsh canfod ‘sees, beholds, 
perceives’. The equation shows that the preform was *kanta-bi- (Matasović 
2009: 188; Schumacher 2004: 83, 242, 245). Additionally, the verb con-céitban 
‘consents, assents’ shows the preverb in tonic position, which further differen- 
tiates it from ceta ‘first’. We can thus set this verb aside for now. 

Turning back to ceta ‘first’, we can examine the alternation of the first 
vowel. There are not enough examples in Wb. and MI. individually to be sure 
that the choice between cita and ceta is not simply random. That is, assuming 
that the scribes randomly chose cita or ceta,* it is possible to arrive at the dis- 
tribution of forms seen in Table 1 for the two gloss collections. On the other 
hand, if we compare the forms in Wb. to those in ML., it is clear that the prefer- 
ences are different.? There are various ways to interpret this difference. One 
could argue that they represent different scribal or scriptorial practices, re- 
gional variation, temporal variation or any of a number of different possibili- 
ties. Being a historical linguist, I interpret the differences through the lens of 
historical change: what was originally ceta at the time of Wb. in the mid- 


3 I take this as the verb con-ricc ‘meets, encounters’ preceded by the prepositional relative fris 
(a) and ceta ‘first’. This fits with the Latin text glossed: quae tibi ex his intranti uicinior ‘which 
of these (is) nearer to you having entered’. 

4 The form cíatu is ignored here, as it is likely analogical (GOI § 398). 

5 The following results are obtained by using Fisher's exact test, a standard statistical test 
when one variable (here ceta vs. cita) is being compared to another variable (here Wb. vs. Ml.) 
to see if they vary independently of one another. Wb. has 3 instances of ceta to 0 of cita; MI. 
has 1 instance of ceta to 5 of cita. Testing this lets us see that this distribution of ceta / cita is 
not independent of the manuscript in which they appear, i.e. Wb. vs. Ml. The chance of this 
distribution appearing by chance is less than 596 (p=0.0476). 
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Table 1: Pretonic ceta ‘first’ in the Old Irish glosses. 


Wb. MI. 
ceta-ru-chreti Wb. 7°11 cetid-deirgni ML 12453 
ciatu-ru-chreitset Wb. 14°29 citid-tucat ML 12534 
ceta-thuidchetar Wb. 21°5 cita-roichet ML 44°4 
cetu-ru-pridach Wb. 26°4 cita-rochet MI. 86719a 
cita-commairsed MI. 39°15 
cita-rogaib® MI. 38°3 
ad-cita-ace Tur. 60 fris-cita-comrici Thes. 2: 23.38 
cita-ru-oirtned Thes. 2: 241.16 


eighth century became / was on its way to becoming cita by MI. in the late 
eighth or early ninth century.’ 

That the e in the first syllable of Wb. is regular seems reasonable from the 
point of view of Irish sound change. Insular Celtic *kentu- would have given 
*kédu- and then *ked- in pretonic position, and it is this that we find in Wb. 
ceta. We will return to the final vowel later. Somewhat puzzling is that the pre- 
tonic e in ceta did not give a as in a ‘his’ from Early Old Irish e. This may be 
due to analogy with the accented prefix cét- ‘first’ (or the fact that in ceta, the e 
was originally nasal, cf. the suggestion made in Lash 2017a). 

A similar sound sequence is found in etar ‘between’ < *enter. Comparing 
*kentu- and *enter is instructive, since both are found in pretonic position and 
both have a similar phonetic structure. Interestingly, however, their outcomes 
are not totally parallel. Table 2 gives the outcome of the sound sequence *ent in 
both Wb. and MI. in pretonic position. 

It should be clear that Wb. shows no difference for etar vs. ceta, while MI. 
shows a considerable one. In Griffith (2016a: Appendix) it is argued that the pre- 
tonic sequence et in etar was on its way to becoming it. The tendency can be 


6 Note that this example could be analysed as cita-ro-gaib. The same could probably be ar- 
gued for cita-ro-chet as well. The exact position of ro is not critical for the purposes of this 
paper, but the fact that the particle appears as ro is suggestive of its being accented, since pre- 
tonic ro tends to become ru, especially as a second pretonic preverb (see GOI 8 101 for the rule 
and Stifter [2013] for extended discussion of it). 

7 The exact date of the Milan Glosses is unclear, but it is generally placed in the late eighth to 
early ninth centuries (see Lash 2017a: 148 and references therein). 
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Table 2: Outcomes of pretonic *enter and *kentu in 
Wb., ML, and Sg.? 


Wb. etar itar Ml. etar itar Sg. etar itar 


31 1 60 4 28 10 
ceta cita ceta cita ceta cita 
3 0 1 5 0 0 


clearly seen in Table 2. The tendency is, however, much stronger for ceta than for 
etar. In Griffith (2016b), it was argued that the only plausible explanation for the 
difference is that etar was a (relatively common) preposition in Old Irish, while 
ceta was not. Schumacher (2012) has documented the frequent analogical inter- 
actions between accented and unaccented allomorphs of prepositions (see also 
Griffith 2016a: Appendix), and it seems that the accented conjugated preposi- 
tional forms of etar exerted enough analogical influence to slow the change of 
pretonic etar to itar. Since ceta was not a preposition, this analogical influence 
was not present, allowing for a much faster, presumably regular phonological 
change to cita. Since the presentation of Griffith (2016b), Lash (2017a) has made 
a strong case for a slightly different interpretation of these facts. He notes that 
the i-variants are concentrated not only in pretonic position, but in a specific 
subset of this position: in pretonic complexes (which he defines roughly as a pre- 
tonic position containing more than one element). His argument is persuasive 
and I accept most of his findings here. One of the advantages to his approach is 
that it can explain the differential rate of change from etar to itar and ceta to cita 
in a straightforward manner without recourse to analogical influence. Since the 
preverb ceta ‘first’ was largely, though not exclusively, found in relative verbs 
(García-Castillero 2014: 87-89), which are pretonic complexes, the preverb would 


8 The vowel in the final syllable is written variably in the sources. As it is not important for 
the topic under discussion here, I have simply written it as a here (see the appendix in Griffith 
2016a on the final vowel in etar / itar). The data for in Table 2 comes from Table 1 above for 
ceta and Lash (2017a: appendix, Tables A, B, and C) for etar. The relevant parts of Lash's tables 
are the columns “prep(osition)” and “preverb”, with the further proviso that only verbal forms 
where the etar is pretonic are included here. I further only consider glosses to Priscian in the 
St. Gall column and thus exclude other minor gloss collections that appear in his Table 
B. Finally, I exclude Wb. 283 etir fessin et dóini ‘between himself and men’ because etir is a 
conjugated preposition and thus not relevant to pretonic position. I also include two examples 
of etir from gloss Ml. 97?7 where Lash notes only one. 
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be more likely to change its e to i than etar, which had a more balanced distribu- 
tion in pretonic complexes and simplexes. 

We will shortly have occasion to return to Lash (2017a), which focuses on the 
initial vowel of ceta, but now the final vowel of ceta < *kentu-, which has re- 
mained outside the discussion, must be accounted for. The oddity is not so much 
that the final vowel is variously written as i (cetid-deirgni [Ml. 1243), a (cita-roi- 
chet [M]. 44°4]) or u (cetu-ru-pridach [Wb. 26°4]), but rather that it is present at 
all. Despite Thurneysen (GOI 873), final vowels should be lost in all preverbs. 
The only recognised exception is in relative position, where there is precedent for 
preverbs having an extra syllable. This is standard for the preverbs ar and imm 
(GOI § 493.4), as the contrast between the (a) and (b) examples below shows: 


(3) a. ar-beir 
PV-enjOY3sc.pres 
‘he enjoys’ (MI. 43°14) 
b. ara-m-ber 
PV-REL-enjoYasc pas 
‘that he enjoys’ (MI. 69718) 


(4) a. im-folngi 
PV-make35¢.pres 
‘it makes’ (Wb. 4d32) 
b. imma-folngi 
PV-REL-makess; purs 
*which causes' (Wb. 16b8) 


While Breatnach (1994b) has also found examples of the extra syllable with 
other preverbs in relative constructions, he notes that for these other preverbs, 
the extra syllable is not the rule but rather the exception:? 


(5)  asa-gusi 
PV-REL-wishssc pres 
‘who wishes’ (MI. 61°17) 


9 For instance, there are two examples of the preverb in(d) with an extra syllable in relative 
construction. This might be expected, given that the preverb was *inde / *eni, but the preverb 
normally does not have an extra syllable in relative construction, and it takes class B infixed 
pronouns, not class A (as noted by Breatnach 1994b: 198). For the preverb *as, the form asa 
even appears in non-relative position, as noted by Thurneysen (GOI § 834B). This fact will 
enter the discussion again below. 
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For the preverbs ar and imm, the extra syllable in relatives is the remnant of the 
fact that these preverbs historically ended in a vowel: *are and *ambi (see Uhlich 
2009-2010 on the quality of the final vowel; but see also García-Castillero, this vol- 
ume). Since we know that *kentu- also ended in a vowel historically, it would not 
be at all far-fetched to assume that it would also retain this vowel when relative, 
like ar and imm: *kentu-io- » ceta, as suggested by Jürgen Uhlich (apud García- 
Castillero 2014: 88, n. 23). Since ceta developed into a preverb largely in relative 
contexts (García-Castillero 2014: 87-89), ceta would be the most frequent form of 
the preverb. Nonetheless, ceta also appears to be the non-relative form. Evidence is 
limited, however, to two verbal forms from Milan: cita-rochet and cita-cómmairsed. 


(6) airní doib cita-rochet 
for-NEG-COPs;; pres tO3p, PV Sing,uc.ssc.pnrr.pAss 
‘For it is not to them that it was first sung.’ (Ml. 86719a) 


(7) combad frisnagruade 7 
so.that-““*COP 356, «sus against-the,cc prweur=Cheeksycc.p, and 
frisnaforbru cita-commairsed 


against-theacc.pr.masc/neur=CYCDIOWS acc. pr, PV-meetssc.psr.sus; 
“50 that it might be against the cheeks and against the eyebrows that it 
would first meet’ (Ml. 39°15) 


It is noteworthy that in both cases, the verb is found in a cleft sentence with a 
fronted prepositional phrase. According to the rules of Old Irish syntax these 
verbs should be non-relative (Strachan 1929: 123, n. 7), and the form of the pre- 
verb, ceta, should thus be surprising. As has been noted, however (GOI § 506; 
McCone 1985: 96), relative verbs are occasionally found in such sentences even 
in the glosses.'? If these two verbs are indeed relative, then there is no real prob- 
lem with their form cita,” but it should not be forgotten that an interpretation as 
relative does contravene the rules of Old Irish grammar.” A perhaps unexpected 


10 I thank Elliott Lash for reminding me of this fact. 

11 Lash (20172) does take these two forms as pretonic complexes, which means that they are 
relatives. I am less ready, however, than he is to assume that these two examples of cita are 
relative verbs. They may be, but the likelihood of that is not totally clear. For now, I assume 
they are non-relative, but at the end of this paper I reconsider the possibility that one or both 
are, in fact, relative. 

12 The concept *rule" here should not be understood as a straitjacket but rather as a generali- 
sation based on observed phenomena. In that sense, a form that goes against the rules should 
be seen as inviting further investigation. That investigation will follow below. 


246 —— Aaron Griffith 


suggestion that leaves the rules intact is that the retained final vowel in non- 
relative forms of ceta is regular before the Cowgill Particle *eti. The derivation 
can be posited as here: 


(8) *kentu-eti > *kentuueti > *kedoueh > *kédow’ > *kédoi > *kéde > ceta 


The important assumption of this derivation is that there was no elision of the *e 
of *eti after *u. The standard assumption underlying the particle theory is that 
the *e was elided after any vowel (Schumacher 2004: 98-99; see Jasanoff 1997: 
152-153 for a possible — though analogically motivated — exception). No one 
seems have considered cases after *u, however, probably because the vowel 
rarely appears in a position where it would be in contact with the Cowsill parti- 
cle. As such, it might appear that this rule is ad hoc. There are a couple further 
verbs, however, which support the non-elision argued for here: the verbs ocu-ben 
‘touches’ and ceta-bi ‘perceives, feels’. 

The first of these verbs, ocu-ben ‘touches’, is fairly rare. The six forms at- 
tested in the glosses are all from MI. and are given below in Table 3: 


Table 3: Ocu-ben in Ml. (forms in bold are non-relative). 


ocu-biat 126512 occu-robae 98*8 
ocu-bether 53517 ocu-bendar 54°12 


nicon-rocmi 76°12 nad-ocmanatar 54°12 


Note that the disyllabic pretonic preverb oc(c)u is both relative (2x) and non- 
relative (2x) and that the preverb ends in a historical *u: *onku- (GOI § 848; con- 
tra Matasović 2009: 299, who derives it from *onko-).^ This verb thus gives 


13 It should be noted that the precise developments here are uncertain. For example, it is not 
clear whether *uu would fall together with *ou even at this late period, as it did earlier (McCone 
1996: 55). If it did not, then *kéntuueh > *keduw' > *kédui > *kédoi etc. is the likely development, 
since there was no difference between *ui and *oi at this stage (Cowgill 1967: 135-137; Greene 
1976: 39; Uhlich 1995: 15-16; Schrijver 2007: 362 n.12; see also Bisagni 2012: 14). 

14 Pedersen’s explanation of the retained -u in ocu-ben as due to a third person singular neu- 
ter infixed pronoun (Pedersen 1909-1913, 2: 298) seems unlikely. The verb is transitive, and 
we would not expect a meaningless infixed pronoun with a transitive verb. Compare also the 
verbal noun with an objective genitive: cid cuit a ocmaide ‘even as to touching it’ MI. 39?10. 
The objective genitive with verbal nouns is normal for transitive verbs. While an infixed pro- 
noun with a verb that is inherently transitive is certainly possible, the resulting verb does not 
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added support to the suggestion that the Cowgill particle does not elide its vowel 
after *u. Why the final vowel in ocu-ben is consistently u but more regularly a in 
compounds with ceta is uncertain. It may reflect the nature of the preceding con- 
sonant: velar [g] favours u while dental [d] is neutral (cf. McCone 2015: 127 for a 
comparable observation in the context of consonantal u-quality). 

The second verb relevant for the question is ceta-bi ‘perceives’. As noted 
above, the preverb is *kanta ‘along’ and is thus different to ceta ‘first’ « *kintu. It 
is nonetheless relevant here. In relative verbs, *kanta would have regularly given 
ceta, much like ar / relative ara and imm / relative imma and imme (as well as 
relative ceta « *kentu-io-). For non-relative verbs, while there are no other exact 
parallels of preverbs of a shape like *kanta, *cet would have been the most likely 
outcome. Evidence from the glosses (see Table 4) shows that the form of the pre- 
tonic preverb is always ceta, regardless of whether it is relative or not (non- 
relative forms are in bold): 


Table 4: Ceta-bi in the glosses (forms in bold are non-relative). 


Wb. MI. 
ceta-biin 12°8 cita-m-bí 36^1 
cita-m-bé 36^1 
Sg. cita-m-bénn 44°15 
ceta-biat 351 cita-m-betis 29°13 
cita-biat 22°7 
cita-bé 68°15 
cita-roba 44°22 


Once again, there is not a large number of forms, but the relative and non- 
relative forms all have the final vowel -a (and incidentally, the vowel of the first 
syllable in Wb. and MI. corresponds to the pattern for *kentu- ‘first’). The likeli- 
est explanation for the unexpected appearance of the final vowel of the preverb 
in non-relative forms of ceta-bi ‘perceives, feels’ is that it was taken over from 


remain transitive but becomes intransitive (cf. at-baill ‘dies’ < ‘throws it’ or at-reig ‘rises’ < 
‘raises himself). 
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the otherwise identical ceta ‘first’, which has been argued to have regularly 
ceta in both relative and non-relative clauses. 

If all of this is correct, we have some further, limited support for the second- 
position clitic main-clause particle in the (Pre-)Old Irish verbal complex. While 
this particle usually disappears without a trace of influence on the preverb it was 
attached to, it seems to modify the shape of the preverb in exactly one case: 
when the disyllabic preverb contained a *u in the final syllable. There are only 
two such preverbs known to me: ceta ‘first’ « *kentu- and ocu « *onku-. The pre- 
verb ceta, found in ceta-bi, appears to have followed the pattern of ceta ‘first’ in 
non-relative contexts. 


1.3 Cleft sentences with fronted prepositional phrases 


As was noted above, the main verb in cleft sentences with fronted prepositional 
phrases should be non-relative according to the rules of Old Irish grammar 
(Strachan 1929: 123, n.7). A more complete account says that when the subject or 
object is fronted, the main verb is relative (see also GOI 8 494, 8501), but when 
anything else is fronted, the main verb is non-relative unless the word must be 
followed by relative -n- (i.e. in temporal clauses, manner or degree clauses, figura 
etymologica, source or cause clauses, with adverbially used adjectives; see GOI 
(8 383, 88 497—502) and Uhlich’s contribution to this volume). The focus here is 
on fronted prepositional phrases (hereafter PPs), where non-relatives are 
expected: 


(9) ar is do thabirt dígla berid in 
for COP3sc.pres to N ringpar reVengegry CAITY3sc.pres theacc.sc.masc 
Claideb sin 
sword, DIST 
‘For it is for wreaking revenge that he carries that sword.’ (Wb. 6713) 


As noted in GOI (§ 506; see also McCone 1985: 96; O hUiginn 1986: 63), relative 
verbs are occasionally found: 


(10) Ni fris ru-chét a propheta 
NEG-COPssc»as AGAINSt35¢.masc/NEUT AUG- singssc pngr.pAss by prophets: 
‘It is not with reference to it that it was sung a propheta. (MI. 64713) 


9 Prepositional cleft sentences in the Old Irish glosses — 249 


(11) acht is do sochaidi no-pridchib 
but COPasc ers to multitudep4; PV-preachiso pur 
‘but it is to a multitude that I will preach’ (MI. 45?8) 


The question addressed here is how common the use of relatives in such sen- 
tences in the major gloss collections is. 


2 Methodology 


Examples of cleft sentences were collected inclusively (i.e. with a wide net) 
from the three major Old Irish gloss collections: Wb., Ml. and Sg. This included 
examples with fronted prepositions and adverbs (as there is significant overlap 
in function), as well as fronted subjects and objects. As it turned out, there are 
no exceptions to the rule that fronted subjects and objects are followed by a 
relative verb, and these sentences are not considered further here. After collect- 
ing the cleft sentences, numerous possible examples were excluded: 
— as noted above, subject and object clefts as well as adverbial clefts were 
excluded; 
- examples without overt copula were discarded, since other interpretations 
are possible (see below in [12] for an example); 
- examples with nasalising relatives according to GOI (8383, 88 497-502) 
were set aside; 
- some other examples were also excluded, such as noun phrases used ad- 
verbially: in chruth so ‘in this manner’ or in méit sin ‘in that size, so much’. 


Two examples of excluded sentences are given here in order to show the types 
of considerations made during the analysis: 


(12) per prophetas do-n-icfad cucunn 
through prophets,cc pr. PV-"comessc cr శయ్య 
‘[It is?] through the prophets that He would come to us.’ (Wb. 21°3) 


15 The distinction between adverbial and prepositional cleft is made in formal terms. That is, 
a number of adverbs are formally conjugated prepositions and are considered prepositional 
phrases for purposes of this chapter. 
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gl. UT SIMUS IN LAUDEM GLORLE EIUS NOS, QUI ANTE SPERAUIMUS IN 
CHRISTO 

‘in order that we might be in praise of His glory, we who previously hoped 
in Christ [i.e. through the prophets, that He would come to us]’ 


This example is excluded on principle because of the lack of a copular form at 
the beginning, but one should note that the relative is probably dependent on 
being in indirect discourse after Latin sperauimus *we hoped', rather than being 
in a cleft sentence. This analysis fits with the lack of copula and illustrates why 
copula-less examples are excluded. 

The next example is excluded because the fronted element is not a preposi- 
tion, but it is interesting because the leniting relative seems out of place given the 
Thesaurus translation, to which a nasalizing relative would be more appropriate. 


(13) is mo  ro:-chéess crist airi 
LEN ; 
COP 3s¢.pres more AUG- “suffer, prer.pass Christyomsc f0¥3s¢.masc.acc 
i báas 
i.e. deathyou 


‘It is more that Christ has suffered for him, i.e. death. [Therefore cast off 
the foods that you love].' (Wb. 6*8; trans. Thes.) 

‘what Christ has suffered for him is greater, i.e. death. Therefore. . .' (au- 
thor's translation) 


The translation offered here makes clear that ro-chéess is a relative without an- 
tecedent (GOI 8496; Ó Cathasaigh 1990) in a copular sentence (see also 
Uhlich's contribution to this volume). 

Having excluded various examples as indicated above, it remains to classify 
the prepositional clefts. Since the orthography of Old Irish is frequently ambigu- 
ous, the remaining prepositional clefts are coded as relative (14), non-relative 
(15), ambiguous (16), non-nasalising relative (17), and non-leniting relative (18): 


16 There are several ways in which a form can be ambiguous. A form like do-rat-side (Wb. 23°17) 
is ambiguous because r does not show mutations (a geminate spelling rr is not probative, since 
it might indicate nasalisation or lack of lenition). The case is similar for f, l, m, n, p, and s. A 
further type of ambiguity is due to the irregular use of Class C pronouns in relative contexts. 
While a Class C pronoun after a preverb is a sure indication of a relative verb, a Class A pronoun 
is in the analysis here only taken as indicative of a non-relative verb if the infixed pronoun is 
third person. The other persons are treated as ambiguous. Class B pronouns are treated here as 
non-relative, but since the orthography does not always distinguish [t] and [d], it is impossible 
to make a principled decision between the two classes in some cases (mostly following r with 
first and second person pronouns), as in fordon-cain (Wb. 31°16). 


(14) 


(15) 


(16) 


(17) 


(18) 
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acht is do sochaidi no-pridchib 
but COPasc ers to multitudep4; PV-preachiso cur 
*but it is to a multitude that I will preach' (MI. 45?8) 


huare is hifochaidib bithir hisuidib 
since COPasc 4. in-tribulationspa;,», be3s¢.4an.pass in=thatpar 
*since it is in tribulations that men are for them' (MI. 56^15) 


is airi do-roigu dia geinti 
COP3sc.pres fOF3s¢.neur.acc PV*AUG.ChOOSe 356. prer Godyo, gentilesacc.p: 
‘It is therefore that God has chosen the Gentiles.’ (Wb. 5°12) 


is airi as-berar 


COP3sc.pres fO¥3sc.neur.acc PV*SAY3sc.pres.pass 
‘It is therefore that it is said.’ (Wb. 3°21) 


is airi ro-cload 
COP3sc.pres fOF3s¢.neur.acc AUG:OVELCOME 35, pREr.PAss 
‘It is therefore that it has been overcome.’ (Wb. 3°1) 


One category of verb that does not fit well into this system is that of contracted 
verbs: 


(19) 


ni do dígail for firianu tuccad 
NEG-COP3sc.pres for punishment,,; on righteouSacc.p, pütssc.Auc.pnET.PAsS 
recht 

lawWyom 

‘It is not for the punishment of the righteous that the law has been given.’ 
(Wb. 2853) 


(20) Is do tra | duic-sem a 


Q1) 


COP3sc.pres  fOF3sc.neur_ then mast?) 


ndliged 50 
“Sexpressionacc PROX 
‘It is for this, then, that he has put this expression.’ (Ml. 115^15) 


PUt,uG.prer=3SGmasc theacc.sc.neur 


is do thucad an ‘una 


LEN. NAS. 
COP3¢.pres fOlssc wur PUtssc.auc.prer.pass theyom.sc.neur UNAyom 


‘It is for this that the una has been put.’ (Sg. 45°19) 
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While contracted verbs correlate somewhat with leniting relative clauses, there 
are many exceptions to this trend (see Schrijver 1997b: 113-128 for an analysis 
and McCone 2006: 87-90 for objections; Garcia-Gastillero 2015 is the most re- 
cent contribution to this interesting problem). Under the system of classifica- 
tion adopted here, examples like (19) can be seen as ambiguous or, perhaps, 
non-nasalising relatives. Because they cannot be clearly categorised and may 
not be relative at all, they are left out of further consideration. There are six 
such examples: Wb. 7?2, 24°26, 283, MI. 62°2, 71°9, Sg. 16151. 

Examples (20) and (21) are interesting because they appear to be examples 
of nasalising and leniting relatives of contracted verbs. There are two examples 
like (20): Ml. 56°11 and 111°15; and there are two examples like (21): Sg. 45°19 
and 7755. Though the two Milan forms could involve the writing of nasalised t- 
as d- in a nasalising relative, they are ultimately ambiguous: duic in 115°15 
could be a simple copying error or a late contracted form of du-uic. A parallel 
can be seen in gloss initial duic (Ml. 40°22), where there can be no question of a 
relative form. The same explanation is available for ducad (Ml. 56*11). 

Forms like thucad in Sg. seem to follow the post-Wb. Old Irish tendency to 
lenite morphologically relative forms (GOI 8 495). Nonetheless, it is notable that 
both examples appear in the sequence is do thucad X ‘it is for this that X was 
put’. The form do in these glosses is the third singular masculine or neuter of 
the preposition do ‘to, for’ and is etymologically the bare preposition which is 
taken over as the conjugated form. It would thus be expected to lenite what fol- 
lowed, and it is just possible that these two forms show this lenition. For a 
slightly inexact parallel one might compare air thuccai (Ml. 42*8) or ce thuc 
(Thes. 2: 225.19 [Carlsruhe Glosses on Priscian]) for the lenition of the initial of a 
contracted verbal form. The upshot of this discussion is that the contracted ver- 
bal forms in the glosses do not appear to offer solid evidence for relative forms 
after clefted prepositional phrases. 
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3 The data: A first-pass analysis 


Once the various exclusions noted above were carried out, the remaining exam- 
ples were classified as above and tallied. The results are found in tables 5 
through 7 below: 


Table 5: Wb. main verbs after prepositional 
clefts (229 examples in total). 


non-relative 120 
ambiguous 45 
non-nasalising 46 
non-leniting 17 
relative 1 


Table 6: Ml. main verbs after prepositional 
clefts (265 examples in total). 


non-relative 101 
ambiguous 43 
non-nasalising 93 
non-leniting 18 
relative 10 


Table 7: Sg. main verbs after prepositional 
clefts (118 examples in total). 


non-relative 62 
ambiguous 16 
non-nasalising 32 
non-leniting 6 


relative 2 
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This quick classification comparing clearly relative forms to clearly non-relative 
forms shows that relatives after clefted prepositional phrases are very rare in Wb. 
(< 1%), relatively rare in Sg. (approx. 3%), and uncommon in MI. (approx. 9%.” 
The examples classified as relative deserve a closer look. 


4 The data: a closer look at relatives 
4.1 The Wb. data (one relative example) 


The single example classified as relative in Wb. is the following: 


(22) is airi onabrüi(thea) in 
NAS? 
COP3sc.pres fOF3s¢.neur py. A ERE pass theyom.p.masc 
gésci 
branchesyom.pr 


‘It is therefore that the branches were broken.’ (Wb. 5°29) 


This is not certainly relative, but at least possibly so. The question is how to 
interpret the spelling: con-abrüithea (non-relative) or con-n-abruithea (relative). 
The following example makes clear that a non-relative form is possible in 
principle: 


(23) rodbo dia ad-roni et 2nói 
either Godyom PV-makeaye.3sc.prer aNdLarn  PV-:preservess penes 
‘It is either God who has made and preserves.’ (Wb. 29°29) 


Since this sentence contains leniting relatives, the spelling 2nói must stand for 
con-oi, which implies that »nabrüi(thea) can stand for con-abrüithea, which is 
ambiguous in the classification here. This analysis removes the only example 
given above as relative in Würzburg, meaning that there are no examples of rel- 
ative verbs after clefted PPs in the Würzburg Glosses. 


17 One might compare the clearly relative forms against all others, in which case Wb. has ap- 
proximately 0.5% relative forms, Sg. approximately 2%, and MI. around 5%. This count given 
here is deliberately somewhat conservative, trying not to bias the discussion unnecessarily. 
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4.2 Sg. data (two relative examples) 


The examples from Sg. are somewhat more interesting. This first of these is: 


(24) is i foilsigud frecndairc 
COP3sc.pres in demonstrationy,; presentparsc.masc 
asa-gnintar i nego 7 tu. tri atarcud 


PV-recognise3sc.pres.pass in “ego and fu through anaphora,cc sc 


immurgu asa-gnintar hi sui 

however PV-recognises3c¢.prrs.pass iN Sui 

‘It is in present demonstration that it is recognised in ego and tu. [It is] 
through anaphora, however, that it is recognized in sui.’ (Sg. 19754) 


Under discussion is the first example of asa-gnintar,' which could be seen as 
relative (see Breatnach 1994b on prepositions with added vowels in relative 
compound verbs). On the other hand, note that the preverb is regularly asa for 
this verb in Sg. There are eight total examples in Sg., and six are certainly non- 
relative;? the two examples above in (24) are probably not relative either. 

The second example in Sg. of a relative verb in a prepositional cleft is the 
following: 


(25) cid | arndid hua thuislib ildaib 
What for-“SCOP3¢¢.prrs from ~caseSparp, pluralpar.pr.masc 
disruthaigedar?? 


(PV-)derive sp, pres.pass 
‘Why is it from plural cases that they are derived?’ (Sg. 198°3) 


eDIL takes the verb as a simplex (s.v. dísruthaigidir), in which case this is in- 
deed a relative form. The evidence, however, makes it more likely that this is a 
compound verb di-sruthaigedar, in which case the form in (25) should be classi- 
fied as non-leniting. Finite forms of the verb are non-probative as to the simplex / 
compound nature of the verb, since there are only two examples additional to 
the one above, and both of these are conjunct / prototonic: hua-n-dirrudiged(d)ar 


18 The second is excluded according to the principles outlined above because there is no 
overt copula in the sentence, though in this case a cleft sentence seems clearly to be the cor- 
rect analysis. 

19 The examples are 29?3 (bis), 146^16, 180"2, 209^13, and 210710. 

20 Thes. (2: 192) suggests reading disruthaigeddar, which is accepted here as the scribe's 
intention. 
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‘from which they are derived’ Sg. 33°23 and 6-diruidichther ‘from which it is de- 
rived’ Sg. 50%. Non-finite forms are more suggestive. The substantivised participle 
disruthigthe ‘derivative’ appears twelve times, always with the spellings dir(r)- / 
dir(r)-.”' Similarly, the verbal noun dísruthigud ‘derivation’ appears seven times 
with the spellings dir- / dir- (4x), dirr- (1x), and dirs- (2x).” 

It is clear from the attestations that spellings of the prototonic forms and 
nominal forms indicate the lenition and / or assimilation of the s, while the one 
form that could be deuterotonic (the form in (25) above) is also the only one 
with the spelling disr-, which indicates that the s is not lenited. It is therefore 
very likely that this verb (which generally translates Latin derivatur) is a com- 
pound verb. In the context of this chapter, it should be classified as non- 
leniting. That is, it is not in a leniting relative, but it could conceivably be a 
nasalising relative clause.” 

From the analysis of the two possible examples of relative verbs after 
clefted PPs in Sg., we have seen that neither is actually likely to be relative. 
That leaves us with no certain examples of relatives with clefted PPs in either 
Wb. or Sg.. 


4.3 MI. data (ten relevant examples) 


The Milan Glosses have a larger number of possible examples of relative verbs 
in prepositional clefts, and it will turn out that a number of them are indisput- 
able, i.e. they cannot be explained away. It is necessary, however, to examine 
them in more detail, and it is useful to classify the relatives into nasalising rela- 
tives (three examples), leniting relatives (three examples), and ambiguous rela- 
tives (two examples), as well as two verbs for which the distinction is irrelevant 
because they are absolute relative forms, which do not distinguish leniting and 
nasalising contexts. 


21 Loci: 852, 28°4, 33°17, 56°10, 59°12, 61?1, 188°7, 188°12 (bis), 18813, 18816, and 188719. 

22 Loci: 36°1, 51°4, 53°11, 188°4 (bis), 188°8, and 19371. 

23 The retention of pretonic di might seem surprising, but this seems to be not uncommon in 
such learnedisms. One might compare: do-tá ‘differs’ glossing distamus (di-taam-ni [Ml. 117°9]) 
and deferre (di-tá [MI. 120?6]); do-samlathar ‘compares’ glossing disimulat (di-samlathar [M]. 222) 
and disimulans (di-samlad [Ml. 114°3]); do-meiccethar ‘despises’ glossing detero (de-mecimm 
[Sg. 39°1]); and do-nochta ‘lays bare’ (not dinochtaid as in DIL) glossing denudatur 
(di-nochtar [M1. 54223]). The tendency is not universal, however: do-gaib ‘diminishes’ glosses 
deminuitur (do-n-gaibter [Sg. 218°9]; elsewhere di-rogbad [Sg. 9°16]). I have not found examples 
in Wb. of this sort of learnedism, which is probably not surprising since Ml. and Sg. have more 
numerous short glosses which calque the Latin, while Wb. has fewer such glosses. 
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4.3.1 Nasalising relatives 


(26) is samlid inso as-m-bertar ut... 
COP3sc.pres like3se.neur theace.sc=thisacc PV-"5says, pass so.that 
‘It is thus that they are said, in order. . .' (MI. 23°12) 


(27) is samlid insin imme-tét 
COP3sc.pres likess; uu; the ace.sg=thatacc PV-REL-travelsssc pres 
leu=som int ais lósc 
with3p,=3PL theyomsc.masc DeOpleuow lameyom.sc.masc 


‘It is thus that cripples walk with them.’ (MI. 45*9) 


(28) ni arindí | bed n-aipert 
NEG-COP3¢¢,pres because COPss;psrsus "^ utteringnom 
asind-robrad-som 
PV-"*°38G eur’ SAY Auc.3sc.psr.supj- JSGuasc 
‘It is not because it was as an uttering that he would have said it.’ (MI. 50^8) 


It can be noted that the relative form in (27) could have been influenced by the 
(regular) nasalising relative imme-tiagat appearing earlier in the same gloss, 
but I am not inclined to accept that as strong evidence against imme-tét being 
relative. Example (28), however, upon closer consideration, can probably be set 
aside: although arindí was originally a prepositional phrase, it appears that it 
has become fully grammaticalised as a conjunction taking a nasalising relative, 
like the similarly formed isindí, dindí, and lassaní. This leaves example (26) and 
(27), and it seems quite likely that nasalising relatives spread to this class of 
fronted prepositional phrase at the same time such relatives spread to fronted 
adverbials, with which they are essentially synonymous: 


(29) is amin tra as cert in 
COP3sc.pres thus then COPssc.pres.reL COFCCtNomsc.masc/rem th@yom.sc.masc/FEM 
testimin so 
textnom PROX 
‘It is thus, then, that this text is correct.’ (MI. 62°7) 


(30) is amne as coir a lathar 
COP3sc.pres thus ^ COPssc Pues ng; fittingvomsc.neur its explainingyoy 
7... estoasc a chéille 


and expressingyoy its "meaning 
‘It is thus that explaining it and expressing its meaning are fitting.’ (MI. 149) 
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As noted in GOI (88 505—506), the spread of nasalising relatives in such adver- 
bials is itself secondary, probably an extension of the regular nasalisation 
found with manner clefts (GOI § 498). While Wb. shows neither the extension of 
nasalising relatives to manner adverbs like améin / amne nor the extension to 
fronted manner prepositional phrases, the fact that Ml. has both is interesting. 
It is quite likely that the spread of nasalising relatives after PPs and adverbs 
meaning ‘thus’ (i.e. samlaid and amne / amin) is connected, perhaps as a result 
of influence from the conjunction amal, which takes a nasalising relative and is 
frequently found in the collocation amal. . . is samlaid . . . (12 of the 33 exam- 
ples of samlaid in Ml. are found in this sequence).” This is naturally specula- 
tive, but it seems unlikely that there is no connection between the appearance 
of nasalising relatives after samlaid and amne / amin. 


4.3.2 Leniting relatives 


Beside the two or three examples of nasalising relatives, there are three exam- 
ples of leniting relatives. The first of these is: 


(31) as du Christ as immaircide 
COP 35¢.pres.re tO యంపి COP 35¢.pres.re, apprOpriateyoy. sc asc 
in salm so 


the nomsc.masc PSAalMyoy PROX 
‘that it is to Christ that this psalm is appropriate’ (MI 1677) 


The copula is clearly relative,” and since it does not nasalise the following im- 
maircide, it must be a leniting relative. The text as given above is that of Thes. 
(1: 16.30). A closer look at the manuscript (see Figure 2), however, reveals that 
the reading is actually immmaircide, with three m's. This sort of error is quite 
easy to explain as due to copying, but I would like to suggest something 
slightly different, namely, that the exemplar actually had as n-immaircide (i.e. a 
nasalising relative). This was either misread or miscopied as mmmaircide, a 
simple error given that the sequences in, ni and m are frequently almost indis- 
tinguishable in Insular Minuscule. Later, mmmaircide was corrected by the 


24 The examples of this collocation in Ml. are: 26°8, 27922, 31°25, 34°6, 37712, 44^19, 49?11, 
51°28, 7443, 84*9, 96°11, and 12092. 

25 Though is and as do become interchangeable in Middle Irish, there is no evidence for such 
confusion as early as Ml., and I reject the possibility that as here could be non-relative. 
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Figure 2: Ml. folio 16r, gloss 1697.7” 


addition of an initial i in the margin.” This plausible sequence of events, while 
not provable, is attractive in that it can explain the miscopying of immaircide as 
immmaircide, as well as the fact that the initial i is not well-aligned with the 
margin of the gloss. 

The second example of a leniting relative clause is below in (33), along 
with the Latin text being glossed. In order to understand the context better, the 
previous gloss is also given, in (32). 


(32) fris in coais fora-robae som 
against theaccsc.rem Cause,cc ON-REL-begye.3sc.prer 3SGmasc 
‘to the cause that occupied him’ (MI. 64°12) 


(33) ni fris ru-chét a propheta 
NEG-COP3sc.prrs againstss;u; AUG-"sing3sc.prerpass by prophetas, 
‘It is not with reference to it that it was sung a propheta.’ (MI. 64°13) 
gl. usurpat hoc testimonio etiam beatus apostolus Paulus tamquam simile? 
non tamquam proprium", quod non minus Machabeís quam apostolis 
conueniret. 
‘In this passage, as a comparison" [and] not as his own”, even the 
blessed apostle Paul uses what is not less fitting to the Machabees than to 
the apostles.’ 


26 An alternative explanation, that n-immaircide was correctly copied into the Ml. manuscript, 
but then later misread by the corrector as mmmaircide and “corrected”, amounts to roughly 
the same thing. 

27 Photo from Best (1936: plate 16") © Royal Irish Academy; reproduced by permission. 
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While this example may simply represent a legitimate exception to the rule that 
clefted PPs take a non-relative main clause verb, it is important to examine 
whether any alternative interpretation exists. One alternative takes the sen- 
tence as a non-cleft and ru-chét as a headless relative subject ‘what was sung’: 
‘that which was sung by the prophet is not in reference to it (i.e. to what Paul is 
using it for)’. Normally, such sentences would be expected to have a substan- 
tive verb??: 


(34) Ni-fil dit daidbri-siu nachimm-éta-sa 
NEG-beasc pas from-your povertypar=2SG  NEG(REL)-1sG-obtainoss pprs=1SG 
óm muintir 


from-my peoplepar 
‘That you do not obtain me from my people is not because of your poverty.’ 
(Meid 1974: 130-131, Táin Bó Fraích) 


Nevertheless, the division of labour of the copula and substantive verb is not 
as strict as is sometimes implied. The substantive verb is sometimes used 
where the copula would be expected (GOI 8 774; also Stifter 2006: 119). The 
opposite is rarer, but it does occur in a few constructions and individual ex- 
amples (GOI § 816; Ahlqvist 2014: 7; see also (36) below for an example: ni hi 
suidiu). The phenomenon is not well researched, so it is unclear whether as- 
suming the copula here in place of the substantive verb is justified or not.” 
As a result, it is more likely that we have here a leniting relative in a cleft sen- 
tence with fronted prepositional phrase. 
The final example of a leniting relative seems secure: 


(35) mad hua |[alicniud bes amlabar 
if-COP3sc.pres.su from naturep,; be3s¢.pres.supy.re, AUMPyom.sc.masc 
*[For deafness is usual to one who is dumb] if it is by nature that he is 
dumb.’ (MI. 59712) 


One might argue that bes is the substantive verb: ‘if it is by nature that the 
dumb one is’. This interpretation seems forced, however. As a result, the three 


28 I would like to thank Elisa Roma for bringing this example to my attention, though I do not 
assume she agrees with my interpretation here. 

29 Less likely is the interpretation: ‘it is not to that which was sung a propheta (that the com- 
parison is proper / that the comparison refers)’. This would assume that the antecedent of 
ru-chét is found in the conjugated preposition and that the whole gloss is the fronted material 
of an implied cleft sentence. 
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examples with leniting relative verbs can be argued to be rather one example of 
(originally) a nasalising relative and two examples of a leniting relative. 


4.3.3 Ambiguous cases 


In the following example, the form of the preverb is ambiguous in that it could 
be relative or could contain an infixed pronoun: 


(36) cid ho deacht maicc nó ho  deacht 
although-COP 356 pres.susy from divinityp,; SONgg, or from divinityp: 
athar: ara-foima doinacht maic a 
fathergin PVpen*aSSUME3s¢.pres.susy humanitysow SONgen theacc.so.ngur 
n-í ar-roet ní hi suidiu 
“Sone PV-assumeaye.3sc.prer NEG-COP3¢¢.pres in thatpar 
‘Whether it should be from the divinity of the Son or from the divinity of 
the Father that the humanity of the Son would assume that which He has 
assumed, it is not in the preceding (text).’ (MI. 17°3) 


The verb ara-foima may be a relative verb, but it also may contain a pleonastic 
infixed pronoun, coreferential with the neuter object a n-i (see Lucht 1994: 
92-94 on pleonastic infixed pronouns with a n-i). In the latter case, this exam- 
ple does not belong here. 

The second example in this category is also somewhat uncertain: 


(37) acht is do sochaidi no-pridchib 
but COP3¢g¢.pres to multitudep4; PV-preachiso pur 
‘but it is to a multitude that I will preach’ (MI. 45?8) 
gl. IN MEDIO ZECLESUE LAUDABO TÉ. ne putaretur singulís? narraturus 
‘I will praise you in the middle of the church, lest it be thought that I 
preach to individuals’ 


Stokes and Strachan (1901 = Thes. 1: 130, n. i), recognising that the no seems 
out of place, suggested reading not-pridchib ‘I will preach you’, in which the no 
is necessary to infix the second singular pronoun. The emendation, which oc- 
curs on a line break, is possible though not necessarily likely. 
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4.3.4 Relative endings (i.e. no nasalising / leniting distinction) 


Finally, we may turn to cases of simplex verbs with relative endings. As noted 
above, there is no distinction of leniting or nasalising here, but the forms de- 
serve examination nonetheless. 


(38) corbu du reir nach aili 
so.that-COPauc.3sc.pres.susy tO Willpar SOMEgEN.sc.masc  Otheron sc asc 
labraimme 


speak, py pres.REL 
‘that it should be at the will of some other that we speak’ (MI. 31°16) 


(39) amal is ho  imratib gnaither cech 
as COP3sc.pres from thoughtsparp, O3sc.pres.pass.rer €8CAyomsc.masc 
gnim 
deedyom 


‘as it is from thoughts that each deed may be done’ (MI. 38?5) 


Note that vowel distinctions were beginning to become confused already in 
Milan (Strachan 1903a: 52, 67), so (38) could contain labraimmi (i.e. a non- 
relative verb).*° For gnaither, however (the unusual spelling of the first syllable 
notwithstanding), it seems that this must be accepted as a relative form.” 

Of the 10 examples in Ml., at least 3 may not be relative after all: 17°3, 
31°16, and 50°8. Of the remaining examples, three are nasalising relatives (16°7, 
23°12, and 45°9), two are leniting (59°12 and 64°13), one is ambiguous (45°8), 
and one makes no distinction along those lines (3875). 


5 Overview / Conclusions 


The conclusions of this study are modest. From the collection of examples, it is 
clear that neither Wb. nor Sg. has any sure cases of relatives following fronted 


30 A reviewer has kindly brought to my attention two interesting examples: in tan m-bimmi 
(MI. 24°18) and in tain diagma-ni (Wb. 3°15), which both show relative nasalisation but a non- 
relative ending, possibly representing a schwa. This suggests, even as early as Wb., that confu- 
sion was beginning to set in in such cases. 

31 The MI. scribe's occasional tendency to write accented [er'] as -er (cf. a n-í as-ber titul ‘that 
which the title says’ [Ml. 24170) is probably not relevant here. 
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prepositional phrases. This conforms to the standard rules for the grammar of 
Old Irish and would seem to be an isogloss linking these two gloss collections 
against MI., where, by contrast, there are a number of clear examples of relative 
verbs in prepositional clefts. There are as many as ten relative examples in Ml., 
with the certain number being maximally seven (see discussion above). Of these 
examples, just under half are nasalising relatives. It has been suggested that the 
similarity of some manner adverbials (e.g. amne / amin ‘thus’) and certain preposi- 
tional phrases (e.g. samlaid ‘thus’) may have led to the occasional adoption by 
both of nasalising relatives, perhaps on the model of amal, which introduces 
clauses of manner and regularly takes a nasalising relative. Once nasalising 
relatives were possible in this small set of clefted PPs, further spread in other cate- 
gories and encroachment by leniting relatives may also have become possible 
(GOI § 506). Of the three gloss collections, Ml. has the strongest representation of 
the nasalising relative generally (McCone 1980: 15-16; O hUiginn 1986: 63). Given 
that the nasalising relative becomes redundant already by the tenth century, the 
increase of nasalisation in Ml., followed by its rapid decrease and loss, is some- 
what puzzling. Nonetheless, I would suggest that the increased number of nasal- 
ising relatives is probably connected with the spread of relatives in prepositional 
clefts. 


5.1 ceta ‘first’ as evidence for the “Cowgill particle”? 


It is now time to return to the case of ceta ‘first’. It was argued above that 
this preverb provides some evidence for a second-position clitic “Cowgill 
particle”, *eti. Specifically, it was suggested that *kentu(u)-eti gives ceta. 
There were, however, only two cases of non-relative ceta, and both hap- 
pened to be in prepositional clefts. The examples are repeated here for 
convenience: 


(40) airní doib cita-rochet 
for-NEG-COP3sc.pres 60321 PV-Singauc.3sc.Pnrr.PAss 
‘For it is not to them that it was first sung.’ (Ml. 86719a) 


32 There is much more to be said here, but this is not the place. Ó Muircheartaigh (2015: 
204—217) has argued for Bangor connections for both Milan and St. Gall and affinity to Armagh 
for Würzburg. How this might play out for specific features, however, is quite an open question. 
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(41) combad frisnagruade 7 
so.that-"?COPs;; «suy against-theaccpr.weur=CheekSaccp, and 
frisnaforbru cita-commairsed 


against-theacc.pr.masc/neur=CVEDIOWS ,cc, pr. PV-meetssc »sr sup; 
“50 that it might be against the cheeks and against the eyebrows that it 
would first meet’ (MI. 39°15) 


In the light of the examination of prepositional clefts undertaken above, these 
two examples from the Milan Glosses cannot be considered definitely non- 
relative. Since one cannot be sure of their evidentiary value, one must ask if 
there is any solid support left for the idea that the Cowgill particle leaves a 
trace behind after disyllabic preverbs ending in * The preverb ocu < *onku- in 
ocu-ben ‘touches’ is one such piece of support, as there is no other plausible 
explanation for the retention of the final syllable. 

A second piece of evidence is the preverb ceta ‘along’ in ceta-bi ‘feels, per- 
ceives’. Here, the evidence is indirect. As this ceta has a preform *kanta, it should 
have developed to relative ceta-bi and non-relative *cet-bi. Since the non-relative 
form is actually ceta-bi, there must be an analogical explanation for it. It seems 
unlikely that the relative form of the preverb would be taken over directly. The 
fact that some preverbs in relative contexts had an extra syllable was well at 
home in Old Irish, being regular for ar and imm (relative forms ara and imma), 
and as Breatnach (1994b) has shown, the pattern even occurred sporadically also 
for other preverbs. It appears unlikely that an established *cet-bi, relative ceta-bi 
would have been made into ceta-bi for both relative and non-relative without a 
good model. 

The only possible model is ceta ‘first’, but interpreting the evidence is diffi- 
cult. If the two examples (40) and (41) are relative, then we have no positive 
evidence for what the non-relative form was. There are three realistic sugges- 
tions for that form, however: it was cet; it was ceta; or there was no non- 
relative form because the preverb was only used in relative contexts. While 
García Castillero (2014: 87-89) has indeed argued that this preverb originated 
in relative contexts, it is unlikely that it did not spread from there at all. The 
textual attestation of the spread may simply be lacking. If the preverb indeed 
was found in non-relative contexts, it must have taken the form cet or ceta. If 
the non-relative form was cet, there would have been no model for ceta to be 
taken over in non-relative position in ceta-bí. On the other hand, if the non- 
relative form was actually ceta (and we happen not to have attestations of it 
because both (40) and (41) are actually relative forms), then this would support 
the argument being made here, and it would provide a model for non-relative 
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ceta in ceta-bi. Finally, it may indeed be the case that one or both of the exam- 
ples (40) and (41) is non-relative. There would then be a model for non- 
relative ceta in ceta-bi, and ceta ‘first’ would provide direct positive evidence 
for the rule that disyllabic preverbs ending in *u retain the final syllable be- 
fore the Cowgill particle. Though the evidence in not entirely straighforward, 
we are left with a problem if the non-relative form of ceta ‘first’ was anything 
but ceta.” 


5.2 The origin of the absolute / conjunct verbal endings 


We can now briefly return to the debate about the origin of absolute and 
conjunct verbal endings in Insular Celtic. The evidence cited here will cer- 
tainly not change anyone's mind about the validity of the particle theory as 
explanation for the absolute / conjunct distinction. It does, however, pres- 
ent evidence that disyllabic preverbs ending in *u retained their second syl- 
lable in both relative and non-relative clauses. This does not happen with 
other vowels and must receive some sort of explanation, regardless of one's 
views on the origins of the absolute verbal endings. The particle theory pro- 
vides a relatively straightforward, though difficult to prove, framework for 
that explanation. 


Acknowledgement: I would like to thank my Utrecht colleagues Peter Schrijver 
and Mícheál Ó Flaithearta, the conference participants at the "Variation and 
Change in the Syntax and Morphology of Medieval Celtic Languages" confer- 
ence, and two anonymous reviewers for many helpful discussions and sugges- 
tions on the topics of this paper. 


33 A possible third option is that the forms in (40) and (41) were seen by speakers of Old Irish 
as ambiguous. If they could be seen as either relative or non-relative, they could be examples 
of the bridging context by which relatives in prepositional clefts became possible. While this 
idea has a certain appeal, it seems to be ruled out by the fact that the ambiguity of the forms 
exists only as written. Spoken aloud, the distinction between relative and non-relative would 
have been clear. 
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Appendix: Examples 


Below are given all the examples of non-excluded prepositional clefts in 
Wb., MI. and Sg., i.e. the examples that make up the data represented in Tables 
(5), (6), and (7). 


Wb. 


Non-relative: 114, 9, 283 is, P6, 15, “6, 13, 3°10, “6, 22, 484, 13, 17, 24, 27 is, "27, €23, 
445, 33, 5°16, 27, 36, “16 (bis), 6°12, 13, 19, 30 im-tiagam, "a, 14, 8*6, 16, 971, 18, 23 
(bis), P5, 7 bid, *9, 10, 427, 10°2, 3, “11, 23, 27, 1192, 5, 6, 12°21, 13°3, 5, 16, 22, 32, 
^13, 18, 29, €11, 12, 14°8, 24, 40, 926, 15713, "11, 18, 28, €23, “18, 1677 ar-focarar, 
17°20, “19, 18°5, 19°19, 20, 20°16, €21, 212, 7, “19, “1, 22°10 coiscitir, 17, 22, °41, 
€11 berir, 28, “21, 29, 24°17, 29, “1, 21, 258, €16, 26°11, “8, 25 (bis), 27°11 (bis), 29, 
18, 22, 28719, 17, “12, 19, 29°16, ^12, %6 (bis), 23, 30°25, 31°11, 3286, €13, 3377, 
3486. 


Relative: 5°29. 


Non-leniting relative: 2°3 do-téit, 3°1, 4°27 for-téit, P14, 5౯, 6°29, 30 ad-ciam, 
590, 10°29, 30, “1, 10, 1826, 19°6, “6, 21712, 25°28. 


Non-nasalising relative: 2°17, 3°21, 421, 4417, 571, 6814, P7, 45, 839, “12, 96, 7 
as-berar, “14, 925, 1074, €11, 12, “16, 1182, 12°29, 13°26, 14°33, 15716, 16°4, %14, 
1722, ^29, €23, 181, 19°14, 20412, 226, “10 do-airbertar, 23°12, 17, 925, 26, 24°14, 
22, 25°12, 2703, *8, 120, 29°21, 31710, %2. 


Ambiguous: 1°3, 2°24, 26, 125, 4°7, 27, 32, 35, 37, 5^4, 12, €17, 6°3, “14, 7°3, 14, 
445, 852, 10, 922, 10822, *8, 12°29, 13321, 426, 17°16, 18°13, 20°9, 10, 214, 32, 2337, 
€17, 44, 30 immum-ruidbed, 2583, 26°11, 27°35, 29°28, 30, 929, 31°16, 16, 32910, 14. 


Ml. 


Non-relative: 3°4, 14410, 15°10, 17°8, 20°13 ata (bis), 24°30, 26°8, 27°10 teit (bis), 
28°8, 30924, 31°1, 23, 3216 ata, 10, 34^6, “6 at-taat, 35°26, 3778 berthair, 8 berthir, 
10 téit, 10 berthair, 10 is, 38°5 gnitir, 9, 42°7 berid, 7 beirthi, 7 ra-gab, 43°2, “13, 
44°11, 14, P2, 47°17, 486, 49°11, 27, ౨7, 5085, 8, “18, 51°14, ^12 eirbthi, ^2 da-gneth, 
2 da-rigni, 10, 53°19, "8, 11 da-airilbset, 54°1, 56°3, 15 bithir, 33, “11, 60°11, 62°2, 
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64°10, 67°8, ata, 8 trachtid, 24, 68°2, 3, 69°1 molfait, °3 at-ror, 7291, 12, 74°1, €21, 
413, 83°14, 88°15, 89°6, 90711, 92°12, 94°13, 15, 9, 10 teit, 96°10, 97°17, 10024, 
1014 6-7 saidi (bis), 103°26, 27 teit, 27 is, 106°11, 10812 trachtaid, 1092 (ter), 
111^15 dos-melmais, 112°20, 1142-3, 118°6, 121°8, 123°13, 124°3 (bis). 


Relative: 167, 17°3, 23°12, 31°16, 38°5 gnaither, 45°8 no-prithchib, *9, 50°8 bed, 
59°12, 64°13. 


Non-leniting relative: 2°6, 25°6, 3059, 32°17, 34°6 no-tesad, 39°15, 44°19, 50°7 
ro-cuala, 54°21, “18 no-teged, 57513, 95%, 101°6-7 du-tiagar (bis), 106°3, 111°9, 
1262, 131°14. 


Non-nasalising relative: 2°3, 14°4 ro-gabad, 4 robu, 9, “19 ar-osailcther, 16°10, 
17°18, 18°8, 19°11, 24°15, “10, 26, 29, 2688, 30°3, 31°17, 32°6 du-gnither, 35°8 
ro-gabad (bis), 9, 10, °10, 16, 18, *21, 36°3, °21, 37°12, 14, "16, ‘20, 40°20, 
42°15, 44°1, 4597, 8, 46°21, “3, 10, 47°11, 48°27, 28, 51°12 do-aisilbi, *2, ^8, 25, 
52x0, 53°11 do-airilbset [MS do airibset], 13, 54°22, da, 55°1, 5748, 64°19, 
6624, 69°11, 71°14, 74?1, 81°4-6, 8319, 84°9, 86413, 8972, 90°15, 9127, 94°10 
do-adbat, 96°18, 98°10, 10012, 108°4, 10931 (bis), 110716, 111°3, 113°7, 115714, 
12092, 121°16, 1238, 10, 126°10, 1272, 14, 13251 ro-uctha, 1 as-berat, 1332, 
13936 (bis), 8, 9, 10, 11, 14241. 


Ambiguous: 14°12, 13, 19 ro-segar, 17°2, °7 ar-roét (bis), 18°10, 21711, 26°10, 
31°25, “12, 33°12, 37216, 45°8 as-rubart, 9, 46°24, 47°8, 20, 50°7 ru-radus, 51°19, 
45g, 53°11 do-recachtar, 11 do-recatar, ^17, 6192, 661, %15, 69^1 ro-fessatar, °3 
ro-pridach, 14, 72°9, 85°10, 86719", 88°17, 96°11, 102°7, 10534, 108°12 fu-fálgi, 
113°2, 11993, 125311, 130°8, 145°4. 


Sg. 


Non-relative: 7°14, 988, 19P2, 26°7, 283, 32°2, 36^1, 3871, 41°3, 42?9, 52°1, 54°3, 
6, 56°8, 5771, 66°9, 10, 71°8, 76°7, 90°2, 95^1, 104°5, 113°3, 13834, 13971, 144°3, 
15231, 15933, 1681, 16931, 1732, 1792, 18132, 5, 18322, 18833, ^1, 19135, 19671, 
19732 ata, 11 ar-ícht (ter), 199°3, 20032, P7, 20151, 20337, "3, 8, 20455, 8, 20582, 
20722, 20810, 209°10, 29, 21351, 21588, 21751, 21888, 22239. 


Relative: 11725, 1487 


Non-leniting relative: 149"6 (bis), 158?3, 197°4, 198°3, 208?9. 
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Non-nasalising: 9°10, 1088, 29°15, 3037, 3281, 35°13, 39°25, 45°9, 5074, 598, 
106°16, 140?4, 14331, 15721 15884, 1612, 183°3, 18705, 189°2, 192^4, 197°2 as-bertar 
(bis), 203°5, 2063 do-gni, 207°2, 208?1, 20971, 21074, 211°6, 213°7, K15?3, K66?1. 


Ambiguous: 1831, 6, 28°9, 40717, 6985, 74°8, 10371, 13671, 153°6, 15431, 157°3, 
188714, 195°, 20283, 203322, 206?3 con-osna. 


Britta Irslinger 
10 The functions and semantics of Middle 
Welsh X hun(an): A quantitative study 


1 Introduction 


The different types of reflexive markers are a much-discussed areal feature of 
the languages of Western Europe. While the markers of most European lan- 
guages, like e.g. German sich, French se or Italian si are based on the PIE reflex- 
ive pronoun *s(we-,’ English and the neighbouring Insular Celtic languages 
Welsh and Irish? employ different markers originating from intensifers. As a re- 
sult, reflexive markers and intensifiers are different in the first group of lan- 
guages, but not in the second. German expresses reflexivity with the pronouns 
mich, dich, sich, etc., as in (1a) and (1b), while uninflected selbst ‘self is used as 
an intensifier, adnominally in (1c) and adverbally in (1d), English uses my-, 
your-, himself, etc. in both cases. 


(1 a. German: Ich sehe mich im Spiegel. 
I S€€jsc.pres Macc in.theparsc.masc Mirror 
English: I see myself in the mirror. 
b. German: Er spricht ständig mit sich. 
he speaks3c¢.prrs continuously with REFL 
English: ‘He keeps talking to himself.’ 


c. German: Der Präsident selbst wird der 
theyom.sc.masc President self Willssc.pres thepar.sc.rem 
Feier beiwohnen 


ceremony attend); 
English: The President himself will attend the ceremony. 


d. German: Der Prdsident schrieb seine Rede 
theyomsc.masc President writesscprer HiSacc.rem.sc. Speech 
selbst. 
self 


English: The President wrote his speech himself. 


1 Haspelmath (2001: 1501), König and Siemund (2000: 44-51). 
2 See Irslinger (2014b: 161-164) on Modern Irish and (2014b: 179—182) on Old Irish. 


3 Open Access. © 2020 Britta Irslinger, published by De Gruyter. This work is licensed under 
the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. 
D d o/10 9783110680 
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A formal similarity of intensifiers / reflexives is to be observed especially be- 
tween English and Welsh. Both languages have complex markers consisting of 
a pronoun inflected according to person, number and gender coupled with 
a second element, which is self in English, hun in North Welsh and hunan in 
South Welsh (Table 1). In addition, both markers originate from intensifiers and 
are in use with such function. 


Table 1: Paradigms of intensifiers/reflexives in Modern Welsh and English. 


Modern Welsh Modern English 
North South 
Sg. 1 fy hun fy hunan myself 
2 dy hun dy hunan yourself 
3masc. ei hun ei hunan himself 
3fem. ei hun ei hunan herself 
PI. 1 ein hun ein hunain ourselves 
2 eich hun eich hunain yourselves 
3 eu hun eu hunain themselves 


Because English differs from the other Germanic languages, which have reflex- 
ives based on PIE *se-,? and displays a marker structurally similar to the Welsh 
one, the hypothesis of a celticism in English or at least of convergent develop- 
ments has widely been discussed as a possible explanation.“ 

For both languages the double function of intensifier / reflexive is not yet 
to be found in the earliest documents, i.e. in Old English or Old Welsh.? Both 


3 Cf. Gothic dat. sis, acc. sik, Old Norse ser, sik, Old Saxon sik, Old High German sih « Proto- 
Germanic *siz, *sike ‘himself, herself’ (Kroonen 2013: 437). 

4 See the different treatments e.g. in Preusler (1938: 187), Tristram (1999: 24), Vezzosi (2005: 
228-240), Filppula, Klemola, and Paulasto (2008: 95-97), Miller (2012: 37) and Vennemann 
(2013: 122). According to Poppe (2009: 253-258) the hypothesis remains unproven, albeit at- 
tractive. Lange (2007: 186) is sceptical and suggests conducting further research first. Contrary 
to this, van Gelderen (2019: 225) rejects any influence from Irish or Welsh on the Old English 
Lindisfarne Glosses. 

5 Old Welsh is fragmentarily attested in onomastics, glosses and a few short texts, some of 
which are difficult to understand. This material contains two clear examples of intensifying X 
hun(an). In addition, there is one reflexive construction, containing a verb with the prefix im- 
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English X-self and Welsh X hun(an) were first employed as intensifiers and only 
much later as reflexive anaphors. This means that whatever conclusion can be 
reached with regard to contact influence in the case of the intensifiers cannot 
be relevant for the reflexives, as separate processes brought about their emer- 
gence in the two languages. 

In Old English, co-reference was expressed by the ordinary personal pronouns, 
which were ambiguous in the third persons, cf. (2) and (3). Disambiguation could 
be obtained by adding an intensifier like in (4) (König and Siemund 2000: 44—46). 


(2 hine he beweraó mid wepnum 
1600 heyo, defendssc.pres with weapons, p 
‘He defended himself with weapons.’ (Zupitza 1966 [Elfric, Grammar 96. 
11-12]; late 10th-early 11th century;) 


(3 ða  behydde Adam hine & his wif eac swa dyde 
and hidessc.prer Adamyo, he,cc and his wifeyoy also so doOssc.prer 
‘and Adam hid himself and his wife did the same’ (Crawford 1922 [lfric, 
Genesis 3.8]; late 10th-early 11th century) 


(4) Hannibal... hine selfne mid atre acwealde. 
Hannibalyom hidess¢.prer S€lfacc.sc.masc With poisonpar killsse.prer 
‘Hannibal killed himself with poison.’ (Sweet 1883; [Orosius IV.11]; late 
9th century) 


For English the expansion of the functional scope of the intensifier and its use 
as a reflexive marker can be dated precisely. While the earliest examples can be 
found around 1150, the replacement of the simple pronoun strategy by X-self 
was complete as late as the seventeenth century. Nevertheless, X-self was used 
as a reflexive in the majority of cases already by the end of the fifteenth cen- 
tury. Examples (5) and (6) illustrate the old and new strategies respectively in 
different editions of the Bible (Peitsara 1997: 288; König and Siemund 2000: 49; 
Keenan 2002: 333-350; Lange 2007: 173-177). 


(Old Welsh for Middle Welsh ym-) and possibly an infixed pronoun. Although the analysis of 
the latter is controversial (see the discussion in Irslinger 2014b: 183, 191-193), its analysis 
as a plain pronoun expressing co-reference is probable in view of Middle Welsh (see below, 
section 4.2). Overall, there is not enough evidence to draw any firm conclusions regarding 
the expression of reflexivity in Old Welsh. 
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(5 Adam and his wijf hidden hem fro the face of the Lord God 
‘Adam and his wife hid themselves from the face of the Lord God.’ 
(Peitsara 1997: 321 [Wycliffe, The Old Testament, Genesis 3.8, 1380]) 


(6) And Adam hyd hymselfe and his wyfe also from the face of the LORDe God. 
(Peitsara 1997: 322 [Wycliffe, The Old Testament, Genesis 3.8, 1380]) 


Unfortunately, such detailed information is not available for Middle Welsh X 
hun(an), making it challenging to compare the development of the two languages. 

The present chapter makes a first attempt to carry out such a comparison 
with the help of a quantitative study. The paper is organised as follows: section 2 
summarises the relevant typological and diachronic research on intensifiers de- 
veloping into reflexive markers. Section 3 examines the number of occurrences of 
X hun(an) in the Rhyddiaith Gymraeg corpus, their function and their distribution 
according to text types. Section 4 will then analyse the function of X hun(an) as a 
part of constructions coding reflexive events as well as the semantics and syntax 
of the verbs with which it occurs considering also the material contained in the 
Rhyddiaith y 13eg Ganrif corpus. Finally, the instances of X hun(an) as a reflexive 
marker will be discussed in detail with regard to date, distribution and possible 
triggers of the change. 


2 Typological and diachronic aspects 


An intensifier can be adjoined to each constituent of a clause, referring to the 
entity expressed by this very constituent. Examples (7a) and (7b) from Kónig 
(2001: 748) illustrate this use for two different constituents. In (7a) the intensifier 
is adjoined to the NP coding the agent and refers to it (adnominal use). In (7b), 
the intensifier is adjoined to the VP (adverbial use). Because a verbal action pre- 
supposes the presence of an agent, the intensifier refers not only to the action 
itself, but also, and even predominantly, to the agent who performs it inten- 
tionally. Gast and Siemund (2006: 366) thus propose the term “actor-oriented” 
instead of *adverbial" for the type in (7b), which will be adopted here. 


6 Gast and Siemund (2006: 366, 371) describe the function of an 'actor-oriented' intensifier as 
‘role disambiguation’. The intensifier blocks middle readings of polyfunctional verbal or pro- 
nominal middle markers, stating who is the intentional agent. 
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(7 a. The President himself will attend the ceremony. (adnominal use) 
b. The President wrote his speech himself. (adverbial/actor-oriented use) 
(Kónig 2001: 748; Gast and Siemund 2006: 349) 


The basic function of intensifiers is to evoke alternatives to the referent of their 
focus. In doing so, they structure the set of referents belonging to a certain situ- 
ation into a centre expressed by the intensified constituent and a periphery 
(Kónig 2001: 749). 

Intensifiers thus express co-reference with their head like reflexives, but 
their function is pragmatic instead of syntactic. In combination with reflexives 
they have a disambiguating function, i.e. adnominal intensifiers are used for 
referential disambiguation and actor-oriented intensifiers are used for role dis- 
ambiguation (Gast and Siemund 2006: 363, 370). 

Intensifiers are thus often adjoined to "full reflexives", i.e. transitive events 
in which the agent performs an action on him- or herself (8b), which he or she 
normally performs on a patient (8a). The self-direction of the action is unex- 
pected and thus semantically marked, especially in the case of negative actions. 
The optional intensifier in (8c) from German is actor-oriented and emphasises 
that the actor intentionally performed this act. English does not allow an equiva- 
lent differentiation, because the reflexive and the intensifier are identical and the 
sequence *himself himself is ungrammatical (Kemmer 1993: 52; Kónig 2001: 758; 
Gast and Siemund 2006: 366). 


(8) a. English: He kills his neighbour. 
b. English: He kills himself. 


c. German: Er tötet sich selbst. 
he killssc.pres REFL self 
‘He kills himself.’ 


Because of this functional and semantic overlap, intensifiers have the potential to 
develop into reflexives, and, undergoing grammaticalisation, intensifiers share a 
first functional expansion as markers of “full reflexives". 


2.1 Grammaticalisation pathway of intensifiers/reflexives 


Figure 1 illustrates the grammaticalisation path for Proto-Indo-European (PIE) 
*s(u)e-, which probably was originally an intensifier. In the Romance, Germanic, 
Baltic and Slavic languages, it is the root of reflexive pronouns and of reflexive 
verbal endings, which prototypically express co-reference of the agent and the 
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intensifier — full — grooming/ — anti- — potential — passive 
reflexive body motion causative passive 
@------------- -— 


Late PIE *s(u)e- 
Classical Latin se 


Late Latin se 
———. ——— 
French se 


lle 
Italian si 
eA 
Surselvan se- 


Figure 1: The functional development of of PIE *s(we from PIE to Romance. 


patient of a transitive verb. From there, their scope spread to further domains of 
detransitive voice, such as middle voice, reciprocal, anticausative, impersonal or 
passive. Although the individual languages have reached different stages of gram- 
maticalisation, the development of their functional extensions follow the same 
unidirectional grammaticalisation path (Haspelmath 2003: 235 with Figure 8.18).’ 

On the other hand, English X-self and Welsh X hun(an) cover mainly the 
first stages of the grammaticalization path. Both markers originate from intensi- 
fiers and are still used for this function (Figure 2). 


intensifier — full —grooming/ — other middle — spontaneous 
reflexive body motion situation types events 

a ms Modem English Xself 

Mo T7" Modem Welsh X hun(an) — 

$— — — o Middle Welsh X hun(an) 


Middle Welsh ym-verbs 


Figure 2: The functional scope of Engl. X-self and Welsh X hun(an). 


The Modern English and Modern Welsh markers are used as full reflexives. In 
addition, they can also be found with verbs belonging to various middle situation 
types, but on the whole marking is much rarer than with corresponding verbs in 


7 See Irslinger (2014b: 166-168) with an overview of recent studies on PIE *s(uJe and its devel- 
opments in different languages. See also Harbert (2007: 327) on Germanic, Stéfanini (1962: 114) 
on Romance. 
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languages like German or French. In Figure 2, this expansion is therefore repre- 
sented by broken lines. The fact that the same expression is employed for intensi- 
fiers and reflexive anaphors prevents their use as markers of derived intransitivity 
to a certain extent (König and Siemund 2000: 65), e.g. in verb pairs belonging to 
the inchoative / causative alternation.? In German or French, an unmarked transi- 
tive can be clearly distinguished from its intransitive counterpart, which is marked 
by sich or se respectively, in (9a) and (9b). On the other hand, English and Welsh 
possess a considerable number of “labile” verbs that can be constructed transi- 
tively or intransitively, where no additional marker is needed in the second case, 
as in (9c) and (9d) (Poppe 2009: 262-264). 


9. a. German: Sie Offnet die Tiir. 
she Openssc.pres theacc.sc.rem door 
‘She opens the door.’ 
versus Die Tür Offnet sich. 
thewowsc.rw door Openssse.pres REFL 
‘The door opens.’ 
b. French: Elle ouvre la porte. 
she OpenS3s¢.pres thesc.rem door 
‘She opens the door.’ 
versus La porte s’ ouvre. 
thes; se door REFL opensasc pres 
‘The door opens.’ 
c. English: She opens the door. 
versus The door opens. 
d. Welsh: Mae hi ^n agor y drws. 
be3cc.przs she PRED openy, DEF door 
‘She opens the door.’ 
versus Mae r drws yn agor. 
Þbessc.pres DEF door PRED openyy 
‘The door opens.’ 


However, this parallelism between English and Welsh can be found only in the 
modern period. In Middle Welsh, the verbal prefix ym- is productively em- 
ployed to transform transitive verbs into intransitive ones, expressing a broad 
range of middle functions. The marker, which originates from the Proto-Celtic 


8 Nevertheless, such verbs are not absent from English and the number of lexicalised reflexive 
verbs, motion middles and anticausatives has been increasing since the Middle English period, 
cf. Siemund (2010, 2014). 
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preposition *ambi- ‘about, at all sides’ is predominantly used as a marker of reci- 
procity. However, it occurs also with verbs denoting the other middle situation 
types established by Kemmer (1993: 16-20), such as body care, body motion, 
change of body position, benefactive middle, cognition middle and spontaneous 
events (anticausatives) (Irslinger 2017c: 116-123). Occasionally, ym-verbs can also 
act as full reflexives. Given that a full reflexive function is the first step an inten- 
sifier goes through when expanding its scope by grammaticalization, an analysis 
of such a function is crucial to understanding when and how the intensifier 
X hun(an) developed into a reflexive marker. 


2.2 Middle Welsh X hun(an) in previous research 


Evans (1964: 89, 898) introduces X hun(an) as the Middle Welsh “reflexive pro- 
noun" in the standard handbook A Grammar of Middle Welsh, adducing a great 
number of examples that illustrate its use. However, he does not make a distinction 
between the functions of reflexive marker and intensifier, and most of the examples 
actually contain intensifying X hun, like in (10) through (13). When translating 
Middle Welsh into Modern English, Evans renders Middle Welsh X hun(an) in most 
cases as English X-self. The difference becomes apparent only in languages in 
which reflexives and intensifiers are not identical, like e.g. German.? For the sake of 
clarity, German translations have been added to Evans' English ones. 


(10) e "r | amherauder e hun 


to DEF emperor 3SGyasc.INTS 
‘to the emperor himself / zum Kaiser selbst’ (Jones 1939: 336.33 [Gwyrtheu 
Mair]) 


(11) neu ’r diffetheist du hun 
PTC PERF destroyo2s¢.prer 2SGinrs 
‘thou thyself hast destroyed / du selbst hast zerstórt' (Williams 1951: 20.29 
[Pedeir Keinc y Mabinogi]) 


(12 yr a gewssynt e hun 
DEF PTC getsp peer 3PLinrs 
‘what they themselves had got / was sie selbst bekommen hatten’ (Williams 
1951: 46.27 [Pedeir Keinc y Mabinogi]) 


9 König (2001: 751-752), Haspelmath (2001: 1501). 
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(13) dy anwybot dy hun 
256099 ignorance 2SGwrs 
‘thy own ignorance / dein eigenes Unwissen' (Williams 1951: 2.12-13 
[Pedeir Keinc y Mabinogi]) 


Only in three examples - (14), (15) and perhaps in (57) below - does X hun(an) 
function as a reflexive marker. In German, this is rendered as the reflexive pro- 
noun dich'? followed by the intensifier selbst. 


(14) na chapla dy hun 
NEG reproVezsc.mpv 2SGrerL 
do not reprove thyself / tadle dich nicht selbst’ (Lewis 1925: 23.28 [Cynghorau 
Catwn]) 


(15) ony ledy dy hun 
unless Killosg.pres 2SGrerL 
‘unless thou dost kill thyself / außer wenn du dich (selbst) tótest' (Jones 
1941: 24.25 [Cynghorau Catwn]) 


This construction co-occurs with the one in (16), in which the verb hoffi ‘to 
praise’ is turned into the prefixed ym-hoffi ‘praise oneself’. Here X hun functions 
as an intensifier. 


(16) nac ym-hoffa vyth dy hun 
NEG PV-praiSezsc.mpv ever 256 
‘do not ever praise thyself / lobe dich niemals selbst’ (Lewis 1925: 29.37 
[Cynghorau Catwn]) 


Parina (2007) criticises this analysis, arguing that the instances of X hun contained 
in the Middle Welsh text Pedeir Keinc y Mabinogi (PKM, ca. 26,000 words) are to 
be considered as intensifiers corresponding to the different types established by 
typological research (by Kónig 2001 and others). Parina also maintains that this 
text contains no instances of full reflexives coded with X hun(an). 

The examination of the ym-verbs contained in PKM by Irslinger (2017c) 
yielded no examples of full reflexives like in (16) either. Instead, all ym-verbs in 
said text were found to belong to the group of middle situation types. This 


10 The special reflexive pronoun sich only figures in the third person. In all other persons, the 
respective personal pronoun in dative or accusative case is used, cf. Irslinger (2014b: 171-172). 
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clearly shows that the corpus of PKM is too small to contain all possible expres- 
sions of reflexivity in Middle Welsh. 


3 The corpus-based study 


3.1 The corpora: Rhyddiaith y 13eg Ganrif and Rhyddiaith 
Gymraeg 1300-1425 


The following study is based on the corpora Rhyddiaith y 13eg Ganrif: Fersiwn 
2.0 (Isaac et al. 2013) and Rhyddiaith Gymraeg (Luft, Thomas and Smith 2013), 
covering together the whole period of Middle Welsh. 

Rhyddiaith y 13eg Ganrif contains nearly half a million words from 27 texts 
preserved in 17 manuscripts. The main textual genres represented are history 
and law, which together, in approximately equal parts, make up about 90 per- 
cent of the corpus.” The remaining 10 percent includes mostly short or frag- 
mentary texts belonging to the Mabinogion, natural history, religion, romance, 
and wisdom literature. 

Rhyddiaith Gymraeg 1300-1425 contains some 2.8 million words from over 
100 texts belonging to different genres and preserved in 54 manuscripts. The 
corpus contains texts belonging to all medieval genres, namely genealogy, ge- 
ography, grammar, history, law, Mabinogion, medicine, natural history, reli- 
gion, romance and wisdom literature.” 


11 Texts, titles and manuscripts pages/folios are cited according to Rhyddiaith y 13eg Ganrif 
and Rhyddiaith Gymraeg / Welsh Prose 1300-1425 unless stated otherwise. Translations are my 
own, unless another author is indicated. 

12 The history section consists of three versions of Brut y Brenhinoedd from NLW MS. Peniarth 44, 
Llanstephan 1 and the Dingestow Court manuscript (NLW MS. 5266). Although these texts are in- 
dependent translations from Latin, they are nevertheless very similar. Occasional passages with 
identical wordings are due to coincidence (Sims-Williams 2016: 55). The law texts from British 
Library Cotton Caligula A.iii, NLW MS. Peniarth 29, NLW MS. Peniarth 30, British Library Cotton 
Titus D.ii and British Library Additional 14931 all belong to the Iorwerth redaction. Due to the spe- 
cial character of this textual genre, they contain many passages with identical readings, which are 
also preserved in the later versions of Rhyddiaith Gymraeg 1300-1425. 

13 Lange (2007: 81-82) states that in Old English the occurrence of intensifiers is genre- 
sensitive. They are found more frequently in texts closer to oral registers and directly address- 
ing the reader, while they are rarer in scientific and formal registers. This seems to be the case 
in Middle Welsh as well, but it is not possible to test this hypothesis at the time being. 
Rhyddiaith Gymraeg gives separate word counts only for the manuscripts, which mostly con- 
tain texts belonging to different genres, but not for the texts themselves. 
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3.2 Quantitative and functional analysis of X hun(an) 


The analysis in the present section is based on Rhyddiaith Gymraeg 1300-1425 
alone, which is more representative and balanced because of its size and tex- 
tual variation. The results will therefore describe the language of the second 
half of the Middle Welsh period. Nevertheless, they are also valid for the earlier 
period covered by Rhyddiaith y 13eg Ganrif, as all functional types are also 
found there in roughly similar proportions. In addition, the Bruts and especially 
the law texts contain passages, which have identical or very similar counter- 
parts in the later versions. Rhyddiaith y 13eg Ganrif will be considered in detail 
in section 4. 

With the help of the wordlist, the Rhyddiaith Gymraeg 1300-1425 has been 
searched for all occurrences of hun, hunan and hunein in different spellings, in- 
cluding misspellings. The following homographs were then excluded: hun, hvn 
‘sleep’, hun, hvn ‘one, only’ (including hun used attributively and yr hun intro- 
ducing a relative clause), hvn as an unusual spelling of the demonstratives 
hwn, hynn and a handful of unclear instances. The corpus yielded 4,091 instan- 
ces of X hun" (Table 2). 


Table 2: Instances of X hun in Rhyddiaith 
Gymraeg 1300—1425 sorted by genre. 


Instances 
AU Different 
Genealogy = = 
Geography 21 18 
Grammar 24 18 
History 1548 546 
Law 677 316 


Mabinogion 228 128 


14 In the rest of the chapter, X hun will be used in place of all graphic and grammatical var- 
iants, i.e. X hun(an), X hvn(an) and plural X hunein, X hvnein, X huneyn, X hvneyn. 
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Table 2 (continued) 


Instances 
All Different 
Medicine 34 26 
Natural History 4 4 
Religion 792 328 
Romance 631 408 
Wisdom 132 116 


Total 4091 1908 


In medieval corpora, popular texts are typically preserved in multiple copies 
that are more or less identical. The Brut y Brenhinoedd or the Ystoria Carolo 
Magno: Chronicl Turpin for example occur in fifteen manuscripts. While similar, 
albeit different, wordings show the range of possible expressions for a certain 
concept, identical passages are duplicates that would distort the results of a 
quantitative analysis. These duplicates have thus been eliminated from the cor- 
pus, reducing the data by more than half. 

Table 3 lists all constructions with X hun in the corpus. Duplicates were 
identified according to the following criteria: If the same construction is in- 
volved or if the same constituent is intensified, passages are considered as du- 
plicates even if they were lexically different. In (17) and (18), e hun follows a 
personal pronoun as an adnominal intensifier. Although different pronouns are 
involved, i.e. ynteu and efo, the passage was counted only once. 

In the cases where the constructions were different or X hun occurred with 
another constituent, the passages were considered as different, even when the 
rest was identical. In contrast to (17) and (18), e hun following the verb aeth is 
an actor-oriented intensifier in (19). This passage was thus counted as a sepa- 
rate instance. 


(17 ac ynteu ehun a aeth y gastell dimlyot 
and 3SGy4sc 3SGmasc.nrs PTC Wentssc.prer to castle Dimlyot 
*and he himself went to Dimlyot Castle' (Brut y Brenhinoedd; Oxford Jesus 
College Manuscript 111, page 38° (149): 29) 
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(18) ac efo e hun a  aeth y gastel dimloec 
and 3SGyasc 3SGmascinrs PTC Wentəsc.prer to castle Dimloec 
‘and he himself went to Dimloec Castle’ (Brut y Brenhinoedd; NLW MS. 
3035 (Mostyn 116), page 61”: 13) 


(19) ac ynteu a aeth e hun yg castell dimlot 
and 356450 PTC wentssc.prer 3SGmascanrs to castle Dimlot 
‘and he went to Dimlot Castle himself (Brut y Brenhinoedd; NLW MS. 
Peniarth 46, page 254: 16) 


There are no functional differences between the variants X hun and X hunan or 
between the use of singular and plural forms e hun and e(u) hunein. Accordingly, 
the passages containing these variants were considered as identical. 


3.3 Constructions with X hun 


The 1,908 instances of X hun contained in Rhyddiaith Gymraeg 1300-1425 were 
analysed according to their function. The results are listed in Table 3. According 
to this analysis, X hun is employed mainly (i.e. in at least in 97.33% of the cases) 
as an intensifier in the different constructions illustrated in 3.2.1. 

In 51 instances, X hun following a verb or verbal noun occurs in situations 
that comply with the definition of "full reflexives". However, it would be mis- 
taken to assume that X hun has the function of a reflexive marker in all these 
cases. Rather, these instances show a number of different constructions, which 
will be examined in detail in section 4 to determine the function of X hun. 


3.3.1 Adnominal intensifiers 


Adnominal intensifiers follow a NP or proper name (20), a NP preceded by a pos- 
sessive adjective (21), a pronoun (22)? or a prepositional pronoun (23) respec- 
tively. Both the simple NP and the construction POSS-NP may or may not be 
preceded by a preposition, e.g. y henw e hvn ‘his own name’, o ’y henw e hvn 


15 The stressed possessive pronoun eidaw is used both predicatively and substantivised, cf. 
Evans (1964: 54—55) and eGPC (s.v. eiddo) for the respective constructions. The 17 instances of 
e hun following substantivised eidaw have been counted as Poss+NP in Table 3. 
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Table 3: Quantitative functional analysis of X hun in Rhyddiaith Gymraeg 1300-1425. 


n.c.16 Intensifier adjoined to: "full reflexive 
event” 
NP poss+NP pron. prepos. head verb 
pron. 
Genealogy 0 
Geography 18 1 4 2 4 4 2 1 
Grammar 18 6 9 2 1 
History 546 1 75 213 70 88 11 79 9 
Law 316 20 121 52 39. 23 61 
Mabinogion 128 18 48 12 18 8 24 
Medical 26 3 7 1 12 3 
Natural Hist. 4 1 1 1 1 
Religious 328 66 98 26 63 10 49 16 
Romance 408 1 86 113 41 64 15 74 14 
Wisdom 116 6 42 6 24 4 23 11 
Total 1908 2 282 647 211 322 77 316 51 
Percentage’’ 100.00 14.78 33.91 11.06 16.87 4.04 16.56 


‘from his own name’. In most genres, the POSS+NP type is significantly more fre- 
quent than all other types. 

The examples are given within their contexts to illustrate the function of 
the intensifiers, i.e. structuring the respective situations according to the roles 
of the participants involved, which may be either central or peripheral. 


(20) [Ac yny diwet hwn pymp kenedyl yssyd yny chyuanhedu nyd amgen. norma- 
nyeit. bryttannyeit. saesson. fichtieit. ac ysgottieit.] 
ac 0 hynny oll nyd oed gynt yn y 
and of PROX all NEG bes; before in  3PLposs 


16 n.c. = not classified: due to corruption of the manuscript, it was impossible to determine 
the context and thus the function of X hun. 
17 Values were rounded off to the second decimal place. 
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medu 0 'r mor pwygilyd namyn bryttannieit 
possess, from DEF sea toanother except Britons 

eu hun. 

3P Lints 


[‘And today, there are five nations who inhabit it, namely the Normans, 
the Britons, the Saxons, the Picts, and the Scots] and of all these, in the 
past no one possessed it from sea to sea, but the Britons themselves.’ 
(Brut y Brenhinoedd; BL Cotton Cleopatra MS B V part i, page 2: 7-9) 


(21) ac y dodes ynteu ar y ran kymre 
and PTCL PUtzsc.prer 3SG ON 3SGyascposs part Cambria 
0 y henw e hvn. 
from 3SGmasc.ross name 3SGyasc.ivTS 


‘and he called his part Cambria from his own name.’ (Parry 1937: 24 [Brut 
y Brenhinoedd; BL Cotton Cleopatra MS B V part i, page 11*: 3]) 


(22) [Sef y rodes y auarwy y nei. llundein ac yarllaeth keint. Ac a rodes y the- 
neuan y nei y llall yarllaeth kernyw.] 
Ac ynteu ehun yn vrenhin ar gwbyl. 
and 3SGmasc 3SGmascunrs in king on whole 
[‘To Avarwy his nephew he gave London and the Earldom of Kent, and to 
Tenevan, his other nephew, he gave the Earldom of Cornwall,] and he 
himself was king over the whole.’ (Parry 1937: 70 [Brut y Brenhinoedd; BL 
Cotton Cleopatra MS B V part i, page 34”: 23]) 


(23) [a choffau na wnathoed y vrawd ydaw ef dim o’r cam.] 
namyn ef a wnathoed cam y v vrawd 
but 35656 PTCL dOzsc.pıpr wrong to 3SGyasc.poss brother 
ac idaw e hvn. 
and 1096014506 3SGmascants 
['and to remember that his brother had done him no wrong,] but that he 
had done wrong to his brother and to himself (Parry 1937: 50 [Brut y 


Brenhinoedd; BL Cotton Cleopatra MS B V part i, page 24”: 15]) 


The characteristic morphology of Middle Welsh prepositional objects in (23) 
contrasts clearly with Old English, where intensifiers after prepositional objects 
are also frequent (van Gelderen 2000: 47). On the one hand, most Middle Welsh 
prepositions possess personalised forms originating mostly from their fusion 
with following personal pronouns, the so-called “prepositional pronouns” or 
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“inflected prepositions”; idaw ‘to him’, for example, is the third singular mas- 
culine of y ‘to’. On the other hand, the preposition and the following pronoun 
are separate units in Old English (example 24). 


(24) heht hie bringan to him —selfum 
Ordersce.prer her,«; bring ws to hiMpar selfpar 
‘ordered (them) to bring her to himself.’ (van Gelderen 2000: 47 [Genesis 
2629]) 


3.3.2 Intensifiers as heads 


Like English X-self, X hun can occur alone, without a preceding noun or pro- 
noun, thus claiming the function of a pronoun for itself.? This use is frequently 
found in comparisons after no(c) ‘than’, and after kanys ‘since’, namyn ‘but’ 
and onyt ‘except’, but also without any preceding word, as in (25). In all cases, 
the use of a personal pronoun or a personal pronoun + intensifier would be 
possible as well. 

Almost without exception, X hun as a head codes the subject, but in a few 
cases it is found after uninflected prepositions. 


(25) [Arglwydi heb ef pei barnewch wi oll ellwg hengyst.] 
Mu hunan a ’e lladwn ef. 
1SGinrs PTCL 3SGyasc.nrx killisc.mPr.susy 3SGwasc 
[‘Lords, said he, if you all would judge to release Heneist,] (I) myself would 
kill him.’ (lit.: ‘It’s myself, who would kill him.’) (Brut y Brenhinoedd; 


Cardiff MS. 1.362 [Hafod 1], page 63”: 19) 


3.3.3 Actor-oriented (adverbial) intensifiers 


Actor-oriented intensifiers are further subdivided into two types: exclusive and 
inclusive. In the exclusive type (26a), the meaning of the intensifier corre- 
sponds roughly to personally, in the inclusive type (26b) the intensifier could be 
replaced by too, also (Kónig 2001: 748). 


18 Evans (1964: 89) calls these intensifiers *heads", while Parina (2007: 394) labels the func- 
tion *diskursiv [discursive]". 
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(26) a. I have swept this court myself. Nobody helped me. 
b. I have myself swept this court. I know how difficult that is. 


Welsh uses identical markers for both types (Parina 2007: 393). Example (27) 
illustrates the actor-oriented exclusive use, which is the predominant one in 
the corpus.’? 


(27) ac y cladpwyt ef yny gaera 
and PTCL buryprermes 3SGmasc in DEF city PTCL 
adeiliassei e hunan yn anrydedus. 
build: scor 3SGmasc.nrs PTCL honourable 
‘and he was buried honourably in the city which he had built himself.’ 
(Brut y Brenhinoedd; BL Cotton Cleopatra MS. B V part i, page 11": 21) 


3.3.4 Additional resumptive pronouns 


An additional pronoun can stand between the intensified constituent and the 
intensifier, cf. (28) with a prepositional pronoun and (29) with a possessive con- 
struction. This pronoun refers to the intensified constituent. 

However, since the presence or absence of additional pronouns can be ob- 
served frequently in otherwise identical versions, the pragmatic effect does not 
seem to be very significant. The author of the Cotton Cleopatra version of the 
Brut has a strong preference for them. Overall, resumptive pronouns are rela- 
tively rare. They occur most frequently after finite verb forms, i.e. in 8.54% of 
all verbs followed by intensifiers.^? 


(28) [A gorchymyn a oruc aganipus yr freinc ar eu heneit ac ev hanreith eu bod 
kyn vfydet y lyr ac yw verch.] 
ac y bythynt idaw ef e hvn. 
and PTCL beac ag. tO3sc.masc 3SGmasc 3SGmasc.inrs 
[‘And Aganippus bade the French, on their lives and their possessions, to 
be as obedient to Lear and to his daughter] as they would be to himself 


19 Out of 316 instances of actor-oriented intensifiers, only about 20 appear to be inclusive. In 
several cases, however, it was not immediately evident which use was intended. A more de- 
tailed examination would be necessary. 

20 NP + pronoun: 2 instances; possessive NP + pronoun: 29; pronoun + pronoun: 1; preposi- 
tional pronoun + pronoun: 20; verb + pronoun: 27. In Table 3, these numbers are contained in 
the counts of the respective groups. 
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(Parry 1937: 39-40 [Brut y Brenhinoedd; BL Cotton Cleopatra MS. B V part i, 
page 19”: 1]) 


(29) [ac yna y dodes corineus ar y ran ef . . .] 
o y henw ef e hun kerniw. 
from 3SGyascposs NAME 3SGyasc 3SGmasc.nrs Cornwall 
[‘And then Corineus named his part .. .] Cornwall after his own name’ 
(Parry 1937: 23 [Brut y Brenhinoedd; BL Cotton Cleopatra MS. B V part i, 
page 11: 9]) 


4 Ym-verbs and simple verbs + X hun 
4.1 Semantics 


In Middle Welsh, both ym-verbs and simple verbs occur with X hun, coding 
“full reflexives” and similar situation types. The following section will thus give 
a semantic analysis of the two groups. 

In this section, the nineteen examples found in Rhyddiaith y 13eg Ganrif 
will be considered as well. In the following tables, the numbers in the first col- 
umn refer to these nineteen examples. Those in the second to the attestations 
in Rhyddiaith Gymraeg 1300-1425. 


Table 4: Semantics and frequency of ym-verbs followed by X hun. 


Rhyddiaith y 13eg Ganrif 


am- ‘about’ 1 ymdeith ‘to walk about 
1  ?ymogel ‘to take care’ (< *'to watch about?) 


reciprocal 1 ymguro ‘to beat (one another or oneself)’, here: 
reciprocal 
body care 1  ymwisc ‘to dress’ 


1 amgreffinnaw ‘to scratch oneself 


body movement 1 ymdroi ‘to turn (oneself) 
1 ymdyrchafel ‘to raise (oneself) 


spontaneous event 1 ymdangos ‘to appear, to show oneself’ 
ymagor ‘to open’ (of doors) 


benefactive 1 ymwledu ‘to feast’ (or reciprocal?) 
(prototypical) 1 ymborth ‘to feed, sustain (oneself) 
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Table 4 (continued) 


Rhyddiaith y 13eg Ganrif 


benefactive 1 1 emdiffryd ‘to defend oneself 
(marked) / 2 ymgyuoethogi ‘to enrich oneself’ 
positive self-directed actions / 1 ymdyrchauel ‘to raise oneself (to kingship)’ 
reflexive 1 ymwneuthur yn vrenhin ‘to make oneself king’ 
1 ymwneuthur yn iach ‘to save oneself’ 
1 ymwassanaethu ‘to serve oneself’ 
4 12 ymrydhau ‘to free oneself’ 
1 ymroddi ‘to give oneself, to submit oneself’ 
ymostegu ‘to calm oneself, to maintain silence’ 
self-awareness 2  ymadnabot'to know oneself’ 
self-improvement 1 ymbrofi ‘to prove oneself’ 
self-love 3 ymhoffi ‘to praise oneself’ 
1 ymuoli ‘to praise/admire oneself’ 
self-criticism 1 ymddiheuraw ‘to excuse oneself’ 
2 ymgeryddu ‘to reproach oneself, to punish 
1 oneself’ 
ymgyfyawnhau *to justify oneselP 
negative self-directed actions 1 ymdoddi ‘to consume oneself’ 
1 ymlycru ‘to corrupt oneself 
self-punishment 1 2  emboeni ‘to punish oneself’ 
ymgosbi ‘to punish oneself 
suicide 1 1 ymdihennidio ‘to execute oneself 


2 ymgrogi ‘to hang oneself’ 


I argued above (section 2.2) that in the combination ym-verb + X hun the prefix 
codes co-reference, while X hun functions as an actor-oriented intensifier added 
for role disambiguation. As stated by Gast and Siemund (2006: 365-367, 370), 
the intensifier blocks middle readings of polyfunctional verbal (or pronominal) 
middle markers, stating who is the intentional agent. This is confirmed by the 
fact that all instances found in the database are reflexive or belong to middle sit- 
uation types, whereas reciprocal ym-verbs are almost absent. One example is ym- 
guro ‘to beat (one another or oneself)’ (Ystoriau Saint Greal, NLW MS. Peniarth 11, 
page 239”: 9), which in the passage in question is clearly reciprocal. 

Table 4 lists them according to the increasing markedness of co-reference, 
starting with typical middle situation types, covering a number of different 
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positive and negative self-directed actions and ending with the highly marked 
verbs denoting suicide. Of course, it is sometimes difficult to determine, for the 
verbs located in the middle of the table, where the benefactive middle ends and 
the full reflexive starts. While ymdyrchafel ‘to raise (oneself)’ belongs to the 
middle situation types when denoting a body movement, it is instead benefac- 
tive and self-directed when denoting a metaphorical movement such as a rise 
in rank (‘to raise oneself to kingship’). Ymroddi ‘to give oneself, to submit one- 
self is, with its 16 attestations across different genres, the most frequent verb. 
The unprefixed verbs listed in Table 5 cover the same semantic fields, i.e. 
positive and negative self-directed actions, including even the more detailed se- 
mantics like self-awareness, self-punishment or suicide. As with the ym-verbs, 
synonymous or nearly synonymous verbs are available for several meanings, 
implying that the semantic scope of both groups is actually relatively small. 


Table 5: Semantics and frequency of simple verbs + X hun in “reflexive” situations. PRON 
indicates that co-reference is expressed by an infixed pronoun or a possessive adjective, 
rather than by X hun. 


Rhyddiaith y 13eg Ganrif 


RON bwrw X hun ‘to throw oneself’ (to the ground) 

RON ffustyaw X hun ‘to beat oneself’ (of a bell) 

RON kymunaw X hun ‘to communicate oneself’ (religious) 
RON rhwymo X hun ‘to bind oneself’ (by contract) 


reflexive, “neutral” 11 


౧ 
P 
P 
P 


benefactive / 5 
positive self-directed 
actions 


RON amdiffyn X hun ‘to defend oneself’ 
RON cymorth X hun ‘to help oneself 
RON gwneuthur X hun yn iach ‘to make oneself safe’ 
RON nerthau X hun ‘to help oneself 
gwneuthur X hun yn iach ‘to make oneself safe 
amdiffyn X hun ‘to defend oneself’ 
iachau X hun ‘to save oneself’ 

gwneuthur X hun yn iach ‘to make oneself safe 


P 
P 
P 
P 
, 


, 


N గం జి ON FP ఆఆ HL 00 


adnabot X hun 'to know oneself 

PRON ymendio X hun ‘to amend oneself?! 
ardymheru X hun ‘to moderate oneself’ 
kymedroli X hun ‘to moderate oneself 


self-awareness 
self-improvement 


EA EA ES FA 


21 Ymendáu ‘to rectify, improve’, with its variants amendio, emendio and mendio, is not an 
ym-verb, but a borrowing from Old French amender ‘to correct’. It also constructed both transi- 
tively and intransitively. 
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Table 5 (continued) 


self-love 


Rhyddiaith y 13eg Ganrif 


PRON ganmawl X hun ‘to praise oneself’ 


2 moli X hun ‘to praise oneself 


self-criticism 


PPP PPR 


PRON angreiffto X hun ‘to reproach oneself’ 
PRON angreitho X hun ‘to reproach oneself’ 
PRON barnu ehun ‘to judge oneself’ 

PRON galw X hun ‘to call oneself (a wretch)’ 
PRON kymryt X hun ‘to take oneself (for a fool)’ 


negative self-directed 
actions 


PRP RPP RP RP PR 


PRON gwatwaru X hun ‘to ridicule oneself’ 

PRON gweled X hun ‘to see / consider oneself (as ugly)’ 
PRON roddi X hun ‘to give oneself (in danger)’ 

PRON taraw X hun ‘to strike oneself’ 

PRON twyllaw X hun ‘to cheat oneself 

PRON ymelldigo X hun ‘to curse oneself’ 

cablu X hun ‘to blame oneself’ 

cnoi X hun ‘to chew up oneself’ 


self-punishment 


PRON poeni X hun ‘to punish oneself’ 


suicide 


PRON brathu X hun ‘to stab oneself’ 

PRON lladd X hun ‘to kill oneself’ 

PRON bot X hun yn y lad ‘to kill oneself? (lit. ‘to be oneself at 
one’s killing’) 

llad X hun ‘to kill oneself 


Table 6: Light-verb constructions with gwneuthur ‘to do’. 


suicide 1 


Rhyddiaith y 13eg Ganrif 


gwneuthur X hun y leith ‘to effect oneself one’s death’ 
gwneuthur y leith X hun ‘id.’ or ‘effect one’s own death’ 
gwneuthur X hun y dihenyd ‘to effect oneself one’s death’ 


Finally, both groups also contain light-verb constructions with gwneuthur ‘to 
do’ (Table 6). In these examples, co-reference is not marked on the verb or verb 
phrase, but in the possessive that precedes the associated noun - see below, 


(63) to (66). 
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There are almost no unprefixed reflexive verbs with “neutral” semantics, i.e. 
in which the effect of the verbal action on the agent is neither explicitly positive 
nor negative. This statement is however not based on their verbal semantics 
alone, but on the precise contexts in which the verbs occur. Thus gweled X hun 
‘to see / consider oneself as sth.’ or galw X hun ‘to call oneself sth.’ are neutral in 
principle, but in their actual contexts they convey a negative judgement of the 
agent about himself. Ffustyaw X hun ‘to beat oneself’, on the other hand, would 
be a negative self-directed action in the case of a human agent, but in the only 
attestation found in the corpus it refers to a bell. 

The verbs listed in (30) occur with and without ym-, as in (31) and (32). 
Their semantics are largely synonymous, as some of them occur in similar con- 
texts or in parallel versions of the same text. 


(30) ymadnabod : adnabot X hun *to know oneself 
ymboeni : PRON poeni X hun *to punish oneself 
ymuoli : moli X hun ‘to praise oneself’ 


ymwneuthur yn iach : gwneuthur X hun yn iach ‘to make oneself safe’ 


(31) Na vawl dy hun yn ormod ac na chapla 
NEG prais@2sc.impv 28Ggg; too much and NEG reproachəss.mrv 
dy hun yn ormod. 
2SGrer, too much 
‘Do not praise yourself too much and do not reproach yourself too much.’ 
(Cynghorau Catwn; NLW MS. Llanstephan 27, page 168": 16) 


(32) Nac ym-uawl du hun ac nac ym-hoffa du hun. 
NEG PV-praiseo;pvy 2SGwrs and NEG PV.admireo py 25Gyrs 
‘Do not praise yourself and do not admire yourself.’ (Cynghorau Catwn; 
NLW MS. Peniarth 3 part ii, page 38: 11) 


If over time one strategy of reflexive marking is replaced by another, it is to be 
expected that both variants co-occurred during a transitional period. It is thus 
not surprising that verbs can be found both with and without prefix. In contrast 
to this, some verbs, like those denoting different types of suicide, always occur 
either prefixed or unprefixed. Some of these even contradict the typological 
rule according to which, in languages with two different reflexive markers, the 
heavier marker is used for the more marked situations (Kemmer 1993: 62). In 
this sense, the ym-verbs ymdihennidio ‘to execute oneself and ymgrogi ‘to hang 
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oneself? are atypical. On the other hand, brathu X hun ‘to stab oneself’ and 
lladd X hun ‘to kill oneself always occur unprefixed. 

The reason for this unexpected behaviour may be that the corresponding 
ym-verb is widely used as a reciprocal or has already been lexicalised with an- 
other meaning. The default reading of verbs denoting different kinds of killing 
or killing with various kinds of weapons or instruments is reciprocal (33). 


(33) llad ‘to kill’ : ymladd ‘to fight’ (< *‘to kill each other’) 
saethu ‘to shoot, to fire’ : ymsaethu ‘to fire at each other 
taraw ‘to strike’ : ymdaraw “to strike one another’ 
gwan ‘to stab, to kill’ : ymwan *to joust, to fight in single combat 
brathu 'to stab' : *ymvrathu ‘to stab one another 


If a hypothetical *ymvrathu were derived from transitive brathu 'to stab', its 
most likely meaning would be ‘to stab one another’ and not ‘to stab oneself’. 
This is not an issue in the case of ymgrogi 'to hang oneself and ymdihennidio 
‘to execute oneself’, as mutual hanging or executing is not possible.” 

Another lexicalised ym-verb is ymwelet ‘to meet each other’, so that PRON 
gwelet X hun translates Latin se uidens (34). 


(34) gwelet ‘to see’ : ymwelet ‘to meet each other’ 


In the following case, the situation is even more complex, as the adjective iach 
‘healthy, well, whole’ is the basis of four different verbs, one of which is an ym- 
verb (35). Although the semantics of ymiachau ‘to bid farewell’ are reciprocal, 
they cannot be derived from the underlying adjective or the corresponding un- 
prefixed verbs (36), because in that case the meaning should be ‘to heal each 
other’. The meaning ‘to bid farewell’ is rather based on the concept of ‘leaving 


22 An interesting typological parallel can be found in Modern Greek verbs with the meaning 
‘to kill oneself, to commit suicide’. Besides the compound avtoxtovw (active) ‘to commit sui- 
cide’, in which co-reference is expressed by the first constituent avto- ‘self-’, there are a num- 
ber of other verbs with ‘middle morphology’, i.e. their inflection as medio-passives signals that 
the agent performs the action on him- or herself. Several verbs have additional meanings typi- 
cal for other middle situation types e.g. spontaneous events like ‘to perish’, or intransitive ‘to 
smash’: oxotwvw (active) ‘to kill’: oxorovopot (middle) ‘to kill oneself; to die, to perish, to 
struggle’, kpepw (active) ‘to hang’: xpeptepat (middle) ‘to hang oneself’, amayxovitw (active) 
‘to hang’: anayxoviopat (middle) ‘to hang oneself’, toakíü (active) ‘to break, to squeeze’: 
toaxifopat (middle) ‘to smash, to struggle, to kill oneself’. 
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each other in a healthy condition’ or ‘wishing each other health’. The lexical- 
ised semantics of ymiachau thus seem to prevent a reflexive interpretation. On 
the contrary, the reflexive iachau X hun ‘to save oneself, based on the unpre- 
fixed verb, displays the expected semantics. 


(35) adjective iach ‘healthy, well, whole’ 
transitive iachaf,iachu “to heal, cure’ 
transitive & iacháf, iachdu ‘to make whole(some), heal, cure; save’ 


intransitive 
reflexive iachau X hun “to save oneself 
reciprocal ymiachau ‘to bid farewell’ 


(36) ac heb ohir kymryt y bererin ffonn ae 
and without delay takes, see, DEF pilgrim staff and-3SGmasc.poss 


balmidyden. a ymiachau ae dylvyth a 
palm.branch and bid.farewelly, Wwith-3SGyasc.poss family PTCL 
oruc. ac yr mor yd aeth. 


dO3s¢.prer and DEF sea PTCL ZOssc.prer 
‘and without delay he took his pilgrim staff and his palm branch, and he 
bade farewell to his family, and went to the sea. (Ystoria Bown de 
Hamtwn; NLW MS. Peniarth 5, page 148") 


In Middle Welsh, ym-verbs are usually either reflexive or reciprocal but rarely 
both at the same time. This is different in German and French, where sich and se 
often mark both categories. In Welsh, it was only as late as the sixteenth century 
that some reciprocal verbs started to be used also as reflexives. For instance, 
ymddiddan ‘to speak with each other, to converse’ acquired the additional mean- 
ing ‘to amuse oneself’ (Irslinger 2017c: 119). Another example is ymadnabod 
(reciprocal ‘to know each other’), which in Cynghorau Catwn occurs as the equiv- 
alent of adnabot X hun ‘to know oneself (37). In addition to this single reflexive 
use, there are several attestations of reciprocal ymadnabot ‘to know each other’. 

(37) ymadnabot ‘to know oneself 


PRON adnabot X hun ‘to know oneself = ymadnahok “ta know each other 
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4.2 Syntax 


From the lists above it becomes clear that in most cases the equivalent of an 
intransitive ym-verb with reflexive function is not the unprefixed verb + X hun, 
but rather the construction “pronoun + unprefixed verb + X hun”, whereby the 
pronoun codes the direct object of the transitive verb. 

The pronoun is ambiguous with regard to co-reference in the third person. 
Thus the reader or listener has to infer from the context that the third person 
plural pronoun eu is used co-referentially in (38) and (39), but not in (40). As a 
consequence, intensifiers are frequently added for reference disambiguation. 


(38) [A gwedy gwelet o antigonus . . . yr aerua honno. neilltuaw a oruc | 
a y oreugwyr gydac ef. y geisiaw 
and 3SGmasc.poss best.men with 3SGmasc tO ఈగ 
ev hamdiffin. 
3PLposs defend, 
[‘And after Antigonus . . . had seen this slaughter, he drew aside,] and his 
leading men with him, to try to defend themselves.’ (Parry 1937: 12 [Brut y 


Brenhinoedd; BL Cotton Cleopatra MS. B V part i, page 5": 27]) 


(39) [A gwedy nachaffant hynny. wynt a ervynnassant cannyat y adeiliat caer 
onadunt ev hun. kyulet achroen ech.] 
y geissiaw ev hamdiffin rac ev gelynnyon. 
to ఈగ 3PLposs defendyyn from  3PLposs enemies 
[‘And when they did not / get that, they asked for permission to build a 
fortress of their own, as broad as an ox-hide,] to try to defend themselves 
from their enemies.’ (Parry 1937: 12 [Brut y Brenhinoedd; BL Cotton 
Cleopatra MS. B V part i, page 55”: 19]) 


(40) [A gwedy eu bod tridieu yn ymlat ar kestyll o bop ryw vod. ar gwyr. y mewn 
yn ymlad ac wynt yn wraul ac yn llauurus.] 


anvon a orugant ar brutus y erchi idaw dyuot eu 
sendy, PTCL dos3;.4; to Brutus to askyy tO3sc.masc COMCyy 3PLposs 
hamdiffyn. 

defend, 


[canys ny ellynt wy ymderbynneit ac wynt rac meynt y nyueroed allan.] 

[‘And after they had fought against the castles for three days in every sort 
of way, and the men within had fought them bravely and laboriously, ] 
they sent to Brutus to ask him to come to defend them, [for, because of 
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the great numbers outside, they could not resist them.’] (Parry 1937: 12 
[Brut y Brenhinoedd; BL Cotton Cleopatra MS. B V part i, page 6': 10-11]) 


Pronouns coding coreferential or non-coreferential direct objects occur in two 
different constructions, i.e. with a finite verb or with a verbal noun: 
- an infixed pronoun denoting the object precedes a finite transitive verb 
(41) (Evans 1964: 55); 
- a possessive adjective precedes a noun (42). 


This second construction is identical to the one discussed above in (21), ex- 
cept that the noun is replaced by a verbal noun. Both variants, i.e. POSS+VN 
and POSS+NOUN, occur in the parallel versions of Saith Doethion Rhufain, (42) 
and (43). On the formal level, the intensifier following a POSS+VN construction 
is, of course, adnominal. Nevertheless, these constructions, which outnumber 
those with finite verbs by far, will be discussed together with reflexive finite 
verbs. 

Both the infixed or independent pronoun and the possessive marker agree 
with the subject (which is co-referent with the object) and the intensifier with 
regard to person, number and gender. 


(41) Yna ef a’ e trewis e hun 
Then 3SGyasc PTCL 3SGuascinrx Strikessc.prer 3SGmasc.inrs 
a e yluin y dan benn y vronn. 
with 3SGyasc.poss beak under 3SGmasc.poss breast 
‘Then it [a bird] struck (it) itself with its beak under its breast.’ (Ystoriau 


Saint Greal; NLW MS. Peniarth 11, page 63”: 24) 


(42) [a chyndrwc yd aeth arnaw ef hynny.] 


a e vrathu e hun a wnaeth y dan 
and 3SGmasc.poss Stabyg 3SGmascanrs PTCL dOssc.pres under 
y von @ eœ gyllell 


3SGmasc.poss breast with 3SGyascposs knife 

[yny dygwyd yn varw y'r llawr.] 

[‘And he took it so ill, that] he stabbed (him) himself under his breast with 
his knife, [until he fell dead to the ground.’] (Saith Doethion Rhufain; 
Oxford Jesus College MS. 111, page 131' (541): 34) 
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(43) d e vrath e hun a wnaeth am benn 
and 3SGmascposs Stabbing 3SGmascinrs PTCL dO3s¢.prer around top 
y vron 


3SGmasc.poss breast 
‘and he stabbed (him) himself under his breast with his knife.’ (Saith 
Doethion Rhufain; Oxford Jesus College MS. 20, page 56°”: 1) 


The very same constructions code pronominal direct objects of transitive verbs, 
cf. (44) with infixed pronouns and finite verbs and (40) above with the POSS--VN 
construction. 


(44) [Ac ewythyr ydaw ef ehvn adylyhei gwledychu gwedy custennyn: ac ef a 
ryuelawd a hwnnw.] 
ac a y delhiis ac @ y 
and PTCL 3SGmasc.nrx Capture3s; prrr and PTCL 3SGyasc.inex 
rodes yng karchar. 
PUt3sc.prer in prison 
[‘And his uncle should have ruled after Constantine; and he fought with 
him] and captured him and put him in prison.’ (Parry 1937: 104 [Brut y 
Brenhinoedd; BL Cotton Cleopatra MS. B V part i, page 96”: 22-23) 


The use of ambiguous pronouns plus disambiguating intensifiers is thus essen- 
tially the same as in Old English, as in (2) to (4) above. This strategy is still the 
predominant one in Middle Welsh (Table 7). 

Both constructions are found in Modern Welsh as variants, cf. (45a) and 
(46a) with additional co-referential pronouns versus (45b) and (46b) without 
them.? Contrary to the development in English, the older strategy has not yet 
completely vanished in Welsh. Despite this, the pronominal constructions do 
not figure in all grammars of Modern Welsh. 


(45) a. Fe i gwelodd eihunan yn y drych. 
PTCL 3SGrem.nrx SC@prer 35Gremanrs in DEF mirror 
‘She saw herself in the mirror.’ (Thomas 1996: 269) 
b. Gwelodd eihunan yn y drych. 
SC@perr  35Grem.rer, in DEF mirror 
‘She saw herself in the mirror.’ (Thomas 1996: 269) 


23 See Thomas (1996: 269), Borsley, Tallerman, and Willis (2007: 222), Poppe (2009: 254, foot- 
note 7). 
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(46) a. Rwy "n gallu fy ngweld fy hun 
1696 pres PREDcangy 156099 Seevs 1563 
yn y drych. 
in DEF mirror 
‘I can see myself in the mirror.’ (Poppe 2009: 254, footnote 7) 
b. Rwy "Ün gallu gweld fy hun yn y drych. 
bei pres PRED Canny seey, 1SGrer, in DEF mirror 
‘I can see myself in the mirror.’ (Poppe 2009: 254, footnote 7) 


Table 7 gives the distribution of ym-verbs and simple verbs followed by X hun 
coding “full reflexive” events from both corpora (cf. Tables 4, 5 and 6 above). 

X hun can only be analysed as a reflexive marker in as few as 14 cases, 
and even some of these are controversial. One example is contained in Rhyddiaith 
y 13eg Ganrif, the others are found in Rhyddiaith Gymraeg. Accordingly, these 
13 cases of reflexive X hun constitute 0.68 96 of the 1,908 instances of X hun found 
in this corpus. Although this number validates the dating of the beginning of the 
use of X hun as a reflexive marker in the second part of the Middle Welsh period, it 
is certainly insufficient to justify Evans' (1964: 89) labelling of X hun as the Middle 
Welsh “reflexive pronoun". 


Table 7: Distribution of ym-verbs and simple verbs followed by X hun coding "full reflexive" 
events. 


light-verb ym-verb simple verb 
co-ref. not pron. poss.-vn finite verb vn 
marked * finite verb 
on VP 
Genre Total X hun = intensifier X hun = reflexive 
marker 
Genealogy 
Geography 2 1 1 
Grammar 
History 34 3 17 2 13 
Law 9 9 


Mabinogion 
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Table 7 (continued) 


light-verb ym-verb simple verb 
co-ref. not pron. poss.- vn finite verb vn 
marked + finite verb 
on VP 
Genre Total X hun = intensifier X hun = reflexive 
marker 
Medicine 
Nat. Hist. 1 1 
Religion 34 2 12 14 4 2 
Romance 20 6 3 9 2 
Wisdom 23 12 3 3 5 
8 39 12 2 
Total 123 5 57 47 14 
Percentage 100,00 4,07 46,34 38,21 11,38 


4.3 X hun as a reflexive marker in the corpora 


Table 8 lists the 14 instances of X hun as a reflexive marker together with addi- 
tional information on the manuscripts that contain them, their dates according 
to Rhyddiaith Gymraeg 1300-1425" based on Huws (2000: 58-64) and the 
forms in which they are attested. 


4.3.1 Distribution and date 


The attestations range from the beginning to the end of the Middle Welsh period 
and are found both in earlier and later manuscripts. Reflexive X hun is slightly 
more frequent in fourteenth and fifteenth-century manuscripts (mainly the Red 
Book of Hergest and the Red Book of Talgarth), but it does not seem that older 
textual versions were systematically modernided during the process of copying. 
It is of course possible that the conservative written registers preserve features 
that had already largely been abandoned in the spoken language. 


24 See http: 
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Table 8: Attestations of reflexive X hun according to manuscripts (N = negator, IMPV = 
imperative 256). 


Text Manuscript Date Form Verb 
Bown de Pen. 5 1350 N- moli X hun 
Hamtwn Jesus 111 c. 1sc ‘to praise oneself 
1375-1425 
Credo Athana-sius, Pen. 5 1350 VN adeilat X hun 
Introduction *to edify oneselP 
Cronicl Turpin Pen. 5 1350 356 amdiffyn X hun ‘to defend 
Pen. 9 1300-1350 oneself’ 
Jesus 111 c. 
1375-1425 
Cyngh. Catwn Pen. 3 rus IMPV — kymedroli X hun 
pg. ii 1275-1325 *to moderate oneself 
Cyngh. Catwn Llanst. 27 c. iMPV adnabot X hun ‘to know 
1375-1425 oneself’ 
Cyngh. Catwn Llanst. 27 c. IMPV | ardymheru X hun 
1375-1425 *to moderate oneself 
Cyngh. Catwn Llanst. 27 c. N- cablu X hun ‘to blame oneself 


1375-1425 IMPV 


Cyngh. Catwn Llanst. 27 c. iMPV moli X hun ‘to praise oneself 
1375-1425 
Delw'r Byd Jesus111 c. 3PL  cnoiXhun ‘to chew up 
1375-1425 oneself’ 
Gwyrtheu Mair Pen. 14 1250-1300 N- llad X hun ‘to kill oneself’ 
256 
Y Groglith Pen. 7 Cs N- gwneuthur X hun yn iach 
1275-1325 3sc ‘to make oneself safe’ 
Y Groglith Shrewsb. c. IMPV gwneuthur X hun yn iach 
11 1375-1425 ‘to make oneself safe’ 
Y Groglith Pen. 5 1350 IMPV iachau X hun ‘to save oneself? 
Llanst. 27 c. 
1375-1425 


Y Groglith Pen. 5 1350 VN iachau X hun ‘to save oneself 
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Only Y Groglith shows some variation on the same passage, which renders 
Matthew 27:42 in (47). In the light-verb construction gwneuthur X hun yn iach 
‘to make oneself safe’, X hun is used with a reflexive function in the oldest man- 
uscript, Peniarth 7, in (48). On the other hand, the two younger versions make 
use of pronouns, as in (49) and (50). Peniarth 5 uses the synonymous verb 
iachau with reflexive X hun, in (51). Lastly, Efengyl Nicodemus adds another 
sentence expressing the same content again with a light-verb construction + in- 
tensifier, featuring the prefixed verb ymwneuthur (52). 


(47) [alios salvos fecit] 
se ipsum non potest salvum facere 
REFLacc INTS,ccss NEG Canasc pas Safe,ccs; make; 
['He made others safe;] (him) himself he cannot make safe.' (Weber 2007 
[Matthew 27: 42; Biblia Sacra Vulgata])? 


(48) ereill a wna ef yn yach ac ny 
others PTCL make3e¢.prrs 3SGmasc PRED safe and NEG 
wna e hvn 


makessc.pres 3SGmasc.REFL 
‘He makes safe others, and he doesn’t make himself (safe). (Y Groglith; 
NLW MS. Peniarth 7, page 58” (215): 2) 


(49) [Ereill heb wy a wnaei ef yn iach.] 
ac ny dichawn y wneuthur e hun. 
and NEG be.ablessc.pres 3SGmasc.poss Makeyn 3SGwasc.inrs 
[“Others, they said, he saved| and he is not able to save (him) himself.” 
(Y Groglith; NLW MS. Llanstephan 27, page 105”: 17) 


(50) Ereill a wna yn iach ac ny ’S 
others PTCL makessc.pres PRED safe and NEG 3SGyascinr 
gwna e hun 


make3sc.pres 3SGuascunts 
‘He makes safe others, and he doesn’t make (him) himself (safe).’ 
(Y Groglith; Shrewsbury MS. 11, page 113: 16) 


25 See also [https: .bibelwissenschait.de/startseite/wissenschaitliche- 
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(51) [Ereill a wna ef yn iach.] 
ac ny eill iachau e hun. 
and NEG Canssc.pres SaVevy 3SGmasc.REFL 
*[He saves others], and cannot save himself.' (Y Groglith; NLW MS. Peniarth 
5, page 7*: 20) 


(52) [Ereill a wnaei ef yn iach ac ny dichawn y wneuthur e hun.] 
Ym-wnaet yn iach e hun. 
PV-dO3sc.impr PTCL safe 3SGyasc.inrs 
['He made safe others, and he is not able make himself (safe)] He shall 
make himself safe.’ (Efengyl Nicodemus; NLW MS. Peniarth 5, page 32": 14) 


Texts with reflexive X hun usually also contain instances of the pronominal 
constructions, unless they are very short and thus do not possess many reflex- 
ive verbs altogether. The only exception is Cynghorau Catwn, which is the text 
with the highest number of reflexive X hun in the corpus. 

The earliest attestation is (53) from Gwyrtheu Mair in Peniarth 14, which 
Huws (2000: 58) dates to the second half of the thirteenth century. GPC gives 
1250 as a date for the text, i.e. the beginning of this period. Evans (1964: 89) 
points out that the expected form with an infixed pronoun would be *ony'th 
ledy du hun. 

The same text has two other reflexive constructions, one with the POSS+VN 
construction (54) and one with an ym-verb (55). The latter is replaced by a 
POSS+VN construction in the later version in Llanstephan 27 (56). 


(53) [na elly caffael yechyt am e pechaut ry wnaethost] 
ony ledy duhun 
unless Killosc.prrs 3SGwasc.ngpr 
['you cannot get redemption from the sin you have done,’] ‘unless you kill 
yourself’ (Gwyrtheu Mair; NLW MS. Peniarth 14, Jones 1941: 24) 


(54) [Llawer hep ef a wnaeth o drwc] 
ac en diwethaf e lad ehun. 
and in last 3SGmasc.poss Killyy | ఎకరం 
[‘He did, said he, a lot of evil,’] ‘and in the end he killed himself (Jones 
1941: 25 [Gwyrtheu Mair; NLW MS. Peniarth 14]) 
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(55) Ac ena e dechreuws e vicedonus em-boeni 
and then PTCL begingg¢.prer DEF vicedominus PV-punishyy 
ehun 


3SGasc.REFL 
‘and then the vicedominus started to punish himself (Jones 1939: 148 


[Gwyrtheu Mair; NLW MS. Peniarth 14]) 


(56) Ac yna onewyd y dechreuawd teophilus y 
and then anew PTCL beginssc.prer Teophilus 3SGysc poss 
boeni e hun 
punishy, 3SGyasc.nrs 
*and then Teophilus began to punish himself anew' (Gwyrtheu Mair; NLW 
MS. Llanstephan 27, page 176': 10) 


This isolated example is followed by the instances contained in NLW MS. Peniarth 
5 (White Book of Rhydderch). Of these, Evans (1964: 89) cites (57) from Credo 
Athanasius following Lewis' analysis of the passage. According to Lewis (1930: 
193), this text was translated in the second half of the 13th century. Adeilat e hun is 
found in the introduction, which was not part of the Latin text, but was drafted by 
the Welsh translator. While admitting that the omission of y could be a scribal mis- 
take, Lewis prefers to consider adeilat ehun as an early example of the reflexive 
use of X hun. He argues that this use, which had become very common by 1615, 
had to have started long before then (Lewis 1930: 195). 

On the contrary, GPC (s.v. hun’, section b) considers it as a scribal mistake 
and lists the example as a POSS+VN construction (58). 


(57) Pob cristaun — weithonn a dyly adeilat ehun 
every Christian now PTCL MuUsStəsc.pres build 3SGmasc.ReFL 
[truy weithredoed da yn temyl y Duv a hynny yn gyuuch ac y carhaedo truy 
gret a gobeith a charyat teyrnas gvlad nef.] 
‘Every Christian now has to build himself [through good works into a 
temple to God and this so high that he will achieve through belief and 
hope and love the kingdom of heaven.’] (Credo Athanasius [Introduction]; 
NLW MS. Peniarth 5, page 48”: 13) 


(58) Pob cristaun . . . a dyly y [drll.]? adeilat ehun 
(NLW MS. Peniarth 5, page 14g. B v. 196) 


26 drll. = darllener, darlleniad ‘read(ing), version’. 
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Another interesting case is that of amddiffyn X hun” in (59), which has exact 
parallels in NLW MS. Peniarth 9, page 1”: 4 and Oxford Jesus College MS. 111, 
page 95" (400): 9. The reflexive use of e hun in (59) is at variance with 13 instan- 
ces of the POSS+VN construction as in (60) from Brut y Brenhinoedd, Brut y 
Tywysogion, Ystoria Carolo Magno: Rhamant Otfel and Ystoriau Saint Greal. 


(59) [Canys rolond a dugassei gantaw trossawl troydic hir.] 
ac a hwnnw yd | amdiffynnwys e hun educher. 
and with DIST PTCL defendssc-prer 3SGmasc.rer till evening 
['for Rolond had brought with him a long twisted bar,] and with that he 
defended himself until the evening’ (Williams 1892: 463 [Ystoria Carolo 
Magno: Chronicl Turpin; NLW MS. Peniarth 5, page 74 (63): 34]) 


(60) ym-rodi a wnaethant y eu hamdiffyn e hunein 
PV-submityy PTCL doOspr.prer tO 3PLposs defendy, షం 
0 hynny allan 
from DIST on 
‘they submitted themselves to defend themselves from then on’ (Brut y 
Brenhinoedd; Oxford Jesus College MS. 111, page 39° (154): 27) 


4.3.2 Change through linguistic convergence? 


Strikingly, no instances of reflexive X hun are found in "native" texts like the 
Mabinogion or the laws, but all of them occur in translations or adaptions from 
Latin or Old French. One could speculate that the change in Middle Welsh was 
at least partly triggered by contact influence, but it is hard to find any evidence 
to sustain this claim. This may be due to the following reasons. 

In some cases, both the Latin texts and the corresponding Welsh transla- 
tions were extremely popular, so that it is impossible to determine which ver- 
sion underlies a translated text. Later versions may not necessarily rely on the 
Latin original, but rather on other translations. 

But even in cases where the source is clear, the Welsh translators fre- 
quently rendered the content of a passage in their own words rather than pro- 
ducing verbatim translations. 


27 The am- in amddiffyn (sometimes ymddiffyn) retains the original prepositional meaning 
‘about, at all sides’ (Vendryes 1927: 50). Amddiffyn is constructed mostly transitively and thus dif- 
fers from ym-verbs containing the grammaticalised prefix, which are predominantly intransitive. 
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As the example of Y Groglith has shown, different versions use different 
constructions, all of which are well rooted in the language. The reflexive use of 
X hun does not seem to be triggered by the underlying Latin (or Greek) text. 

The Cordeilla passage in the Cotton Cleopatra Brut is another example of 
the independence of the Welsh version (62), which contains two verbs denoting 
suicide that do not figure in the Latin text (61). Both of them use the POSS-VN 
construction. 


(61) [Eam quoque ad ultimum captam in carcerem posuerunt] 
ubi ob amissionem regni dolore obducta 
where by 10550 kingdom, grief,,, overwhelmed, కటల 
sese interfecit.” 
REFL,cc Killasc ener 
‘[Finally they captured and imprisoned her] where overwhelmed by grief 
at the loss of her kingdom she killed herself.’ (Reeve and Wright 2007: 
45 832 [Geoffrey of Monmouth's Historia regum Britanniae]) 


(62) [A gwedy medyliaw ohoney am y hen deilyngdawd ry gollassei. ac nat oed 
obeith idi ymatkyuot ohynny.] o diruawr dolur hynny y gwnaeth hy hun y 
lleith. nyt amgen nogyd y brathu hy hun a chillell adan y bronn yny gollas 
y heneid. ac yna y barnwyd mae dybrytta agheu y dyn y llad e hun 
*And after thinking over her former dignity which she had lost, and she had 
no hope of raising herself out of it, out of exceeding grief over it she did / 
effected herself her death - that is, she stabbed herself with a knife 
under the breast so that she lost her life. And at that time it was considered 
the most ignominious death for a person to kill himself.’ (Parry 1937: 41 
[Brut y Brenhinoedd; BL Cotton Cleopatra MS. B V part i, page 20": 14-16) 


Latin sese interfecit has the reduplicated and thus emphatic reflexive marker 
sese, but no intensifier. The Welsh translator chose a light-verb construction, 
for which the analysis in (63) seems probable, especially in view of the similar 
light-verb constructions in (64) to (66). 


28 The First Variant Version (ed. Wright 1988: 27) has sese interemit ‘she killed herself’. 

29 Parry (1937: 41) reads hun ‘sleep’ and translates y gwnaeth hy hun y lleith ‘she slept the 
sleep of death', assuming a metaphorical or euphemistic expression for committing suicide. 
Although hun 'sleep' is occasionally used this way in Middle Welsh, Parry's analysis seems 
unconvincing in view of (64) and (65), which mention the instrument with which the act was 
carried out, and (66), which might be a poss-NP construction. 
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Example (66) is most likely to be read as a POSS-NP construction ‘effecting 
his own dead’, but, as the object of a verb occasionally stands between the fi- 
nite verb and the intensifier, it is not excluded that this passage corresponds to 
the others with a slightly modified syntax. 


(63) y gwnaeth hyhun y lleith 
PTCL doOzsc.prer 3SGremanrs 3SGrem.poss death 
‘she effected personally her death’, lit. ‘she did herself her death’ (Brut y 
Brenhinoedd; BL Cotton Cleopatra MS. B V part i, page 20%: 14-16) 


(64) [A phan giglev pilatus hynny] 
y gwnaeth e hun y dihenyd a 'e gyllell. 
PTCL doOssc.prer 3SGmascnrs 3SGmasc.ross death with 3SGmasc.poss knife 
[‘And when Pilatus heard this,] he effected his own death with his knife.’ 
(Ystoria Bilatus; NLW MS. Peniarth 5, page 11°: 14) 

(65) y gorvc ehvn y leas a y gyllell 
PTCL dO3c¢.prer 3SGmasc.nrs 35Gmasc.poss death with 3SGyasc.poss knife 
*he effected his own death with his knife. (Ystoria Bilatus; NLW MS. 
Peniarth 7, page 63" (236): 20) 


(66) [Pan gigleu archelaus mab herot hynny; digallonni a oruc. a gossot y wayw 
yn y daear a mynet ar y vlaen] 
a  gvneuthur y leith ehun. 
and doy, 3564869059 death 3SGyasc.iwrs 
[When Archelaus the son of Herod heard this, he lost his heart and put- 
ting his lance on the ground and going on its point] ‘effecting his own 
dead’ / ‘and effecting himself his death’ (Ystoria Titus; NLW MS. Peniarth 
5, page 37": 48)? 


The Latin influence on Gwyrtheu Mair is more difficult to assess. The Latin text 
is transmitted in several slightly different versions. Example (68) expresses the 
order to kill oneself with the simple reflexive pronoun te followed by the inten- 
sifier ipsum. The emphatic pronoun temet in (67) is already present in Classical 


30 Cf. Ehrmann and PleSe (2011: 12.546) for the Latin text: Herodes amputauit lanceam suam 
et fixit in terram et iactauit se super et mortuus est. 'Herod broke off his spear, fixed it in the 
ground, and threw himself over it and died.’ 
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Latin. In Vulgar, Late and Middle Latin, pronouns enlarged by -met become in- 
creasingly frequent and were often fused together with the intensifier,” like in 
the example. 

Although the Welsh version is an independent renarration (69), the similar- 
ity of the Latin and Welsh verb phrases is striking, especially because the 
Welsh author was probably aware of the parallel structures of Latin temet-ipse 
and Middle Welsh du-hun. It is unlikely, however, that the Welsh author would 
choose to calque the Latin reflexive strategy after having significantly altered 
the whole passage. 


(67) [Scias quum pro malis operibus quae gessisti. iam non potes salutem conse- 
qui nisi feceris quae dixero tibi. Abscide primum tua genitalia membra] 
et deinde interfice temetipsum. 
and then kilbas y 2SGrert-2SGinrs 
[‘Know that for the bad deeds you have done, you cannot obtain redemp- 
tion unless you will do what I will say to you. First, cut off your genitals] 
and then kill yourself, (Neuhaus 1886: 38 [The Pilgrim Girardus; BL 
Cotton Cleopatra MS. C X]) 


(68) deinde interime te ipsum 
then — Killosc.impv 2SGggg, 2SGiwrs 
*Then kill yourself" (Neuhaus 1886: 38 [The Pilgrim Girardus; BL Arundel 
MS. 346]) 


(69) [na elly caffael yechyt am e pechaut ry wnaethost] 
ony ledy duhun 
unless Killosg.pres 25 రాలు 
['you cannot get redemption from the sin you have done,] unless you kill 
yourself (Jones 1941: 24 [Gwyrtheu Mair; NLW MS. Peniarth 14]) 


A second instance of emphatic reflexive + intensifier in the Latin text has no 
correspondence in the Welsh version at all (70, 71). Later on, the Latin reflexive 
verb is rendered by the POSS-VN construction (72, 73). 


31 Cf. Väänänen (1981: 123), Puddu (2005: 206-223). 
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(70) [At ille putans ueraciter eum sanctum esse Jacobum qui talia iuberet. arrepto 


(71) 


(72) 


(73) 


ferro membra uirilia abscidit. ac postea per guttur suum ferrum trahens.] 
semetipsum ad mortem uulnerauit. 

REFL;&«-INTS4«. to deathyce Woundssc.perF 

[‘And he [Girardus] believed, that it was Saint Jacob who ordered this, 
and, seizing a knife, cut off his genitals. And after this, drawing the knife 
against his throat, he hurt himself deadly.’ (Neuhaus 1886: 38 [The 
Pilgrim Girardus; BL Cotton Cleopatra MS. C X]) 


Sef a oruc enteu o debygu en wir 

this.is PTCL doOssc.prer 3SGmasc Of think,, PRED true 

panyv yago ebostol oed ef gwneithur a orchymynassei 
thatis Jacob apostle Þbessc.mpr 3SGmasc dOyy PTCL orderssc piper 


a marw vu. 
and dead Þezsc.prer 

*This is what he did, (as he was) really thinking that it was the apostle 
Jacob, he did what he had ordered and died.’ (lit.: ‘and he was dead.") 
(Jones 1941: 24 [Gwyrtheu Mair; NLW MS. Peniarth 14]) 


et quod ad extremum se _ peremisset. 

and that at endacc REFL. killaso i 

‘and finally he had killed himself.’ (Neuhaus 1886: 39 [The Pilgrim Girardus; 
BL Cotton Cleopatra MS. C X]) 


ac en diwethaf e lad ehun. 

and in last 3SGmasc.poss Killyy 3SGuasc.ints 

‘and in the end he killed himself (Jones 1941: 25 [Gwyrtheu Mair; NLW 
MS. Peniarth 14]) 


In the following case, the correspondence between the Old French source (74) 
and its Middle Welsh translation (75) is rather close. However, while the idiom 
‘to praise someone to the value of one glove’ does not seem to occur elsewhere 
in Middle Welsh, the verb moli X hun ‘to praise oneself’ is found also in the 
Llanstephan version of the Cynghorau Catwn (ex. 78). Theoretically, the Old 
French pronoun me could have triggered a pronominal construction in Middle 
Welsh, but instead fu hunan is used with reflexive function. 
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(74) jeo ne me __ preyse le vailant de un gant. 
I NEG 156% praiSeysc.pres theyasc worth of one glove 
“1 do not praise myself to the value of one glove.’ (Stimming 1899: 68, 
1. 1797 [Boeve de Haumtone]) 


(75) [a ffan ymladom ony ladaf i dy benn di yr mawr a’m cledeu.] 
ny  volaf fu hunan werth vn  uanec. 
NEG praiseisc.pres 1SGrerL worth one glove 
[‘and when we fight, if I do not cut off thy head, thou great fellow, with my 
sword,] I will not praise myself to the value of one glove.' (Williams 1892: 
539 [Ystoria Bown de Hamtwn; NLW MS. Peniarth 5, page 134" (301): 22]) 


The following passage from Imago mundi by Honorius Augustodunensis contains 
a combination of the reflexive marker se and the intensifier ipse (76). To render 
the Latin seipsos ... corrodentes 'chewing up themselves', the author of the 
Welsh translation Delw y Byd gives two apparently synonymous versions (77). 
Both verbs seem to occur only once and were thus most likely specifically created 
for this passage. Interestingly, the two verbs that the author decided to use (the 
ym-verb ymdoddi based on toddi ‘to melt’ and the transitive verb cnoi ‘to bite, to 
chew’) are both constructed reflexively. Even though these very verbs were cus- 
tom-made to render the Latin version, both types of verbs already existed in 
Welsh. The ym-strategy was still productive at the time when the use of reflexive 
X hun started to spread. 


(76) |... praesertim cum me non mihi soli, sed toti mundo genitum intelligam, 
omittens invidos tabescentes, non me], 
sed seipsos livido corde — corrodentes 
but REFL,ce-INTS,cc jealous,sscacu; heart, Chewingacc.pr.masc 
[‘. . .above all, I understand not only my own birth, but the birth of the 
whole world, leaving aside grieving individuals, who are chewing up not 
me,] but themselves with a jealous heart. . .’ (Flint 1983: 48-49 [Imago 
mundi]) 


(77) [Ac yn bennaf oll pryt na dyallwyf i vyg geni y my hun. mwy noc y ’r holl vyt 
gan ysgaelussaw y dynyon kyghoruynnus.] 
ac a ym-:dodant e hunein ac a gnoant e hunein 
and PTCL PV-melt3p, pres 3PLints and PTCL chew3p; pres 3PLREFL 
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o gallon gyghoruynnvs 

of heart jealous 

[‘And above all, since I do not only understand my own birth, but of the 
whole world, neglecting jealous people] who gnaw themselves and who 
chew themselves up with a jealous heart.’ (Delw'r Byd; Oxford Jesus 
College 111, page 243' (976): 14) 


In conclusion, it seems that for the translation of reflexives, Middle Welsh au- 
thors resorted to a repertoire of constructions also found in non-translated texts 
and did not try to imitate?” the Latin or Old French structures.” 


4.3.3 Formal aspects 


In no less than six cases, reflexive X hun occurs with a second singular impera- 
tive verb, cf. (78) to (81). All instances come from two texts only: four from the 
Cynghorau Catwn and two from Y Groglith. 


(78) Na vawl dy hun yn ormod ac na chapla 
NEG praisS@2s¢.impv 2SGgg too much and NEG reproachəsc.mev 
dy hun yn much 
2SGrer, too ormod. 
*Do neither praise nor reproach yourself too much.' (Cynghorau Catwn; 
NLW MS. Llanstephan 27, page 168": 16) 
(79) [Pan gymhello dolur di yn irlloned rac kared dy weissyon.]^ 
kymedrola dy hvn hyt pan ellych arbet y rei 
moderatess¢impv.2sc 2SGrer, SO that CaNos¢sup, forgive, DEF ones 


32 Cf. Winford (2003: 63-65) on “structural convergence”, i.e. imitation of the syntactic struc- 
tures of the contact language with the lexical means of one’s own language. A similar model is 
“replica grammaticalization”, developed by Heine and Kuteva (2003: 539). 

33 A different development took place in Breton, where the French influence was much stron- 
ger. The Breton prefix em- was equated with French se and became, combined with a pronoun, 
part of the preverbal reflexive and reciprocal marker Modern Breton en em, cf. Irslinger 
(2014b: 187, 199). 

34 Cf. Duff (1954: 602) for the Latin text: Seruorum culpa cum te dolor urguet in iram, ipse tibi 
moderare, tuis ut parcere possis. ‘If pain drives you in anger because of the fault of your serv- 
ants, moderate yourself, so that you can forgive the ones belonging to you/your people’. 
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teu di. 

2SGposs 256 

['If pain drives you in anger because of the sin of your servants,] moderate 
yourself, so that you can forgive the ones belonging to you' (Cynghorau 
Catwn; NLW MS. Peniarth 3 part ii, page 37:3) 


(80) os crist wyt ti gwna dihunan yn iach 
if Christ Þbezsc.pres 256 dOoss ny 2SGrerL PRED safe 
‘if you are Christ, save yourself (lit. ‘make yourself safe’) (Y Groglith; 
Shrewsbury MS. 11, page 114:2) 


(81) [Hwnn a distryw temyl duw. ac ympen y tridieu a ’e hadeila.] 
iachaa dy hun. 
SaVEos.iupv 2SGrerL 
[‘The one who destroyed the temple of God and rebuilt it after three days’] 
save yourself.’ (Y Groglith; NLW MS. Peniarth 5, page 7: 18) 


In (82) and (83) from the Cynghorau Catwn, the function of the pronoun di fol- 
lowing the verb is unclear. Objects of imperative verbs are invariably expressed 
by independent pronouns, not by infixed pronouns (82). In addition, indepen- 
dent pronouns occasionally code the objects of other verb forms (83) (Evans 
1964: 49-50). Postverbal di could thus be the object of the verb expressed by 
the independent pronoun, while the disambiguating intensifier indicates its co- 
reference with the subject coded in the verbal ending. 


Nevertheless, it seems more likely for di to code the subject and thus refer 
to the person expressed by the verb. Cynghorau Catwn contains three instances 
of this use with a transitive non-reflexive verb, like in (84). 


(82) Ardymhera di duhvn o 'r gwin. 
moderate;;;;.»y 256 2SGger from DEF wine 
‘Moderate (you) yourself from wine.’ (Cynghorau Catwn; NLW MS. 
Llanstephan 27, page 31: 3) 


(83) Kanys ot atnabydy di dy hun doeth wyt. 
for if knOWəsc.map 2SG 2SGger, Wise Deasc pas 
‘For if you know (you) yourself, you are wise.’ (Cynghorau Catwn; NLW 
MS. Llanstephan 27, page 32: 20) 
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(84) na  chappla di arall am y bei 
NEG blamesscjwev2s; 256 another for DEF mistake 
a vo amat ti dy hun. 


PTCL be3s¢.pres.susy ON2sg 256 2SGyrs 
‘Do (you) not blame another for the mistake that is on yourself.’ 
(Cynghorau Catwn; NLW MS. Llanstephan 27, page 165”: 15) 


4.3.4 Transition from one system to another 


To move from the old Middle Welsh system of reflexive marking to the new one, 
two simultaneous steps are necessary: 

- loss of the object pronoun or the prefixed ym- 

— reanalysis of X hun as object of the verb 


This development is illustrated in (85) for the different structural types: 


(85) PRON + finite verb రాజ trewis e huny, > trewis e hunger, 
POSS-VN construction Yep, brathu hi hung; > brathu hi hunger, 
ym-verb ymgagboeni e hunyr > poeni e hunger, 


The reanalysis of Middle Welsh X hun seems thus natural enough, especially 
since Middle Welsh X hun already occurs as a head in a pronoun-like function 
coding the subject or after a preposition. It is however more difficult to explain 
why the preverbal pronouns and ym- prefixes were lost. 

One context in which this could have happened, are the imperative con- 
structions discussed in 4.3.3, where infixed pronouns preceding the verb are 
not possible. The hypothetical phrase in (86) contains an imperative verb fol- 
lowed by an emphasising subject pronoun and an object pronoun - intensifier. 
The sequence of two second person singular pronouns with different functions 
is not attested and seems to be ungrammatical like English himself himself, i.e. 
the sequence of reflexive and actor-oriented intensifier in (87) from Gast and 
Siemund (2006: 360). The Middle Welsh object pronoun was dropped then, 
leading to expressions that are actually attested, both with and without an em- 
phasising subject pronoun (88, 89). 


(86) *ardymhera di ti du hvn 
moderate2s¢.impv 2SGsupy 25603 256 
‘Moderate (*you) (*yourself;;) Yourself, / Mäßige du dich selbst!’ 
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(87) He killed (himself...) himself 
Er tótete sich selbst 


(88) ardymhera di du hvn 
moderate;s; py 25603 2SGggg, 
‘Moderate (*you) yourself / Máfsige du dich!’ 


(89) kymedrola dy hvn 
moderate»sc wey 2SGrer. 
‘Moderate yourself / Mafsige dich!’ 


While this could be a starting point for the reanalysis of X hun, one wonders 
whether these pragmatically marked reflexive imperative clauses were frequent 
enough to trigger the change of system. This objection is reinforced by the fact 
that all instances contained in Rhyddiaith Gymraeg 1300-1425 cluster in two 
single texts. 

The small sample of 14 instances of reflexive X hun in the Middle Welsh cor- 
pora is thus not enough to formulate a strong hypothesis concerning the trigger 
of the change. More insights can probably be gained from the analysis of Early 
Modern texts, where the X hun reflexives become more frequent. 


5 Conclusions 


The quantitative study based on Rhyddiaith Gymraeg 1300-1425 showed that 
the alleged Middle Welsh ‘reflexive pronoun’ X hun functions, in fact, as an in- 
tensifier in 99.32 96 of cases. Only as few as 14 instances from both corpora 
showed its use as a reflexive, which then became widespread in the modern 
language. In Middle Welsh, 'full reflexive' events are coded by the prefix ym- or 
by an infixed pronoun. Since both strategies are ambiguous, intensifiers are 
added for referent disambiguation and role disambiguation. The two strategies 
are about equally frequent and, to some degree, interchangeable. In some 
cases, a strategy may be blocked because of lexical or syntactic constraints. 
Although Evans (1964: 89) was aware that the reflexive use of X hun was 
only at its initial stages in Middle Welsh, he probably would be surprised to 
find out that the entire Middle Welsh corpus does not provide many more in- 
stances than the three examples that he cited in his Grammar of Middle Welsh. 
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At present, it is impossible to determine what brought about such change 
in the Middle Welsh system, however, linguistic convergence with Latin or Old 
French can certainly be excluded in light of the instances discussed above. 

Finally, hypotheses on linguistic convergence between Welsh and English 
regarding the expression of reflexivity will have to take into consideration the 
scarcity of reflexive X hun in the Welsh corpus before 1425. 


Acknowledgement: This study was carried out within the project Detransitivity 
in the Brittonic languages: Reflexivity, reciprocity and middle voice constructions 
funded by the German Research Council. 


Joseph F. Eska and Benjamin Bruch 
11 Prolegomena to the diachrony 
of Cornish syntax 


1 Prelude 


Little scholarship has been conducted upon Cornish syntax, including the con- 
figuration of the affirmative root clause.’ In this preliminary study, we examine 
the configuration of this clause-type in Cornish from its earliest records through 
Late Cornish and attempt to establish what can be said about it, or, at times, 
what possibilities should be considered when nothing definitive may be said. 


2 Old Cornish 


We follow Schrijver (2011: 2-5) in the view that the earliest neo-Brittonic did not di- 
verge into discrete languages until the eighth century and that what have been 
termed Old Cornish and Old Breton did not diverge from each other prior to the elev- 
enth century. Until that time, they formed a unitary Old Southwest Brittonic. Under 
such an analysis, there are, in fact, no attested Old Cornish verbal sequences. 


2.1 VSO in Old neo-Brittonic 


On the basis of comparison with Old Welsh and Old Southwest Brittonic, we pre- 
sume that Old Cornish was VSO on the way to becoming V2. Cf. the following VSO 
clauses: 


(1) a. Old Welsh 
rodesit elcu guetig equs... 
giVe3s; Pag; Elcu then horse 
*Elcu then gave a horse. . .' (Jenkins and Owen 1984, cited after Watkins 
1987 [The “Surexit” Memorandum]) 


1 There is no controversy over the fact that negative root clauses and all embedded clauses, 
through all periods of the attestation of Cornish, were V1. A constituent is permitted to appear 
before the negator in negative clauses, but they look like other V1 clauses in that the verb is 
not third person singular but is conjugated. 


3 Open Access. © 2020 Joseph F. Eska, Benjamin Bruch, published by De Gruyter. [C9 ETZTSETENI This work 
is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International 
License. 
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b. Old Southwest Brittonic 
dadarued epac(dou)... int rid ou mod 
OCCUY3s¢.pres epacts PTCL free 3PLposs manner 
‘Adviennent les épactes sans obstacle [The epacts occur without hin- 
drance].’ (Fleuriot 1964b: 412 [Bibliothèque Municipale d'Angers, MS. 477]) 


On the basis of compound or negated verbal forms such as Old Welsh imm-it-cel 
(Lambert 2003: 113, gloss 85; Schrijver 2011: 49) ‘it conceals itself’ and Old 
Southwest Brittonic ni-s-guilom (Fleuriot 1964b: 262 [Bibliothéque Municipale 
d’Angers, MS. 477]) ‘nous ne la voyions pas [we would not see it]’, which con- 
tain object agreement affixes that continue pronominal morphemes — which 
are known to occur in the lower left periphery of the clause - we understand 
the verb in these VSO clauses to occupy Fin and to move there synchronically 
because Fin bears uninterpretable «-features (viz. person, number), i.e. [ug], 
that trigger movement through T into the left periphery? 


2.2 V2 in Old neo-Brittonic 


There is also evidence for V2 root clauses in Old neo-Brittonic. Borsley, Tallerman, 
and Willis (2007: 290) cite one token from Old Welsh and Fleuriot (1964b: 413) sev- 
eral from Old Southwest Brittonic:? 


(2 a. Old Welsh 
[Gur dicones remedaut elbid| a "n guorit 
man makessc.prer wonder world PTCL సమం redeemsss scr 
‘The man who created the wonder of the world redeems us.’ (Williams 
1980: verse 5*^? [Juvencus Englynion]) 
b. Old Southwest Brittonic 


[do(u) cuntraid]... a int im pop un mis 
two neap.tide PTCL bess,4s in each one month 
‘Deux marées de morte-eau . . . sont dans chacun mois [There are two 


neap-tides every month].’ (Fleuriot 1964b: 413 [Bibliothèque Municipale 
d'Angers, MS. 477]) 


2 An uninterpretable feature must be checked for a clause to be interpretable. In this instance, 
the q-features on the verb check the uninterpretable q-features in Fin by movement of the 
verb into Fin. For further discussion, see Svenonius (2007). 

3 See Fleuriot (1964a: 151) for the full text of the token cited. All glossed examples cited after 
other authors adopt their respective glossing. 
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It appears likely that these V2 clauses developed out of cleft-type constructions 
(Manning 2001; Willis 2009: 146-147). From the synchronic point of view, such 
clauses, like V1 clauses, bear [ux] features on Fin that draw the verb into the 
left periphery, but, unlike V1 clauses, also an Edge Feature that causes an XP to 
move into SpecFinP, which then may move higher into the left periphery. 


3 Middle Cornish 


There is considerable diversity of opinion about the configuration of affirmative 
root clauses in Middle Cornish as reflected in the title of Mark Kille's Harvard 
University B.A. Honours thesis, *What thing is next I don't quite know": An analysis 
of variation in word order and subject-verb agreement in Middle Cornish (1995). 
Lewis (1946: 47), followed by Kille (1995: 5), maintains that both SVO and V1 are 
unmarked configurations in Middle Cornish. Though he notes that a variety of con- 
stituents can precede the verb — and also observes that V1 occurs, and, impor- 
tantly, that some forms of bos ‘be’ require a V1 configuration — Williams (2011: 336) 
suggests “that Middle Cornish is in essence an SVO language”. George (1991) com- 
piles all of the surface configurations attested in the play Beunans Meriasek 
(BMer., composed ca 1500, edtied in Stokes 1872), but is satisfied to conclude by 
listing only the most common ones. 


3.1 Surface V2 in Middle Welsh and Middle Breton 


It is clear that affirmative root clauses in Middle Welsh and Middle Breton are 
V2, i.e. the preverbal XP is not restricted to the Subject, but may be also an 
Object or Adverb(ial). The post-verbal position of the Subject in (3b) and (Ac) is 
diagnostic of the V2 character of these clauses. The following tokens are cited 
after Borsley, Tallerman, and Willis (2007: 287-290): 


4 The translated quotation is from BMer. (line 107): pandryv nessa ny won fest. 

5 For unequivocal demonstrations that Middle Welsh affirmative root clauses bear V2 configu- 
ration, see Willis (1998) and Meelen (2016). Middle Breton has not been the focus of similar 
studies, but see Schafer (1995) and Borsley and Kathol (2000) for Modern Breton. 
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(3 Middle Welsh 

a. Subject-initial 
[Riuedi Mawr o sswydwyr| a gyuodassant yuynyd... 
numbers large of officials ^ PTCL rises; peer up 
‘Large numbers of officials got up. . .' (PKM 1618-19) 

b. Object-initial 
Ac [ystryw] a wnaeth y | Gwydyl 
and trick PTCL makess;»,; DEF Irish 
*And the Irish played a trick.' (PKM 44.11) 

c. Adverb(ial)-initial 
[Yn Hardlech] y bydwch seith mlyned ar ginyaw... 
in Harlech PTCL be. seven years at dinner 
‘In Harlech you will be at dinner for seven years. . .' (PKM 45.2-3) 


(4) Middle Breton 
a. Subject-initial 
[Cesar] a respontas deze... 


Caesar PTCL replyssc.prer tO3py 
‘Caesar replied to them...’ (Ernault 1887a: 82 §12 [La vie de sainte 


Catherine]) 

b. Object-initial 
hac [an  holl doueouse]... a meux an oll 
and DEF all  gods-PROX PTCL have DEF all 
dispriset. . . 


renouncepsr.PTCPL 
*. . .and I have renounced all those gods. . .' (Ernault 1887a: 80 88) 

c. Adverb(ial)P-initial 
hac  [encontinant] ez aparissas an eal dezy 
and immediately PTCL  appearas s; DEF angel tOssc.rem 
*. . .and immediately the angel appeared to her.’ (Ernault 1887a: 84 8 13) 


3.2 Surface V2 in Middle Cornish 


Recent theoretically oriented scholarship, e.g. Borsley, Tallerman, and Willis (2007: 
291), notes that identical structures are found in Middle Cornish, some tokens of 
which follow: 
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(5) a. Subject-initial 

[ny] a th wor the penan gluas 
1PL PTCL 2SGmnrx pUt3go.pres to Land’s End 
‘We will bring you to Land’s End.’ (BMer. 1. 594)° 

b. Object-initial 
[guyr] a gousaf vy 
truth PTCL speaks pe 156 
‘I speak truth.’ (Norris 1859, 2: 1. 909) 

c. Adverb(ial)-initial 


[ragon] y pesys y das 
foris, PTCL beseechsss par 3SGmasc.poss father 
‘For us he beseeched his father.’ (Stokes 1860-1861: 6, stanza 9°, [The 
Passion]) 


3.3 Surface V3* in Middle neo-Brittonic’ 


All of the Middle neo-Brittonic languages allow more than a single constituent 
to occur before the verb, though, by and large, only one of these constituents 
may be an argument,’ e.g.: 


(6) a. Middle Welsh 
ac [ar hynny] [at Uath uab Mathonwy] yd aethant wy 
and on DIST to Math uab Mathonwy PTCL 8603p, prey 3PL 
‘And thereupon they went to Math uab Mathonwy.’ (PKM 68.15-16) 


6 Though we cite the standard edition and translation by Stokes (1872), we occasionally si- 
lently adopt improvements in the text and translation by Ray Edwards in Syed and Edwards 
(1996). 

7 The asterisk indicates that the verb appears in third or later position in the clause. 

8 The reason for this constraint requires further research. Under the view that V2 structures in the 
Brittonic languages developed out of cleft-type structures (see section 2.2), a preliminary hypothe- 
sis may be that since only a single argument can appear before the relative pronoun in a cleft 
structure, as affirmative root clauses were interpreted as V2, there would not have been any evi- 
dence for a language learner that more than a single argument could appear in the left periphery. 
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b. Middle Breton 


[breman] [a crenn] [me] a gourchemen dit 
now PTCL plainly 1sG PTCL  askssc.pres tO»sc 
*Now, I plainly ask of you.' (Ernault 1887b: 256 1. 240 [Vie de sainte 
Nonn]) 


c. Middle Cornish 
ha [my] [lemmen] a 'th vygeth 
and 156 now PTCL 2SGmrx baptisessc.pres 
‘And now I will baptise you.’ (BMer. 1. 941) 


These structures do not undermine the V2 analysis, for the important character- 
istics of the configuration are that the [ux] features on Fin draw the verb into 
the lower left periphery and that its Edge Feature causes an XP to move into 
SpecFinP. 


3.4 The architecture of the left periphery 


Since Rizzi (1997), it has become clear that the left periphery of the root clause 
is highly articulated (see further Poletto 2002; Benincà and Poletto 2004; Rizzi 
2004, 2013; Haegeman 2012, inter alios). Under this analysis, the communis opi- 
nio understands the hierarchical architecture of the left periphery to be: 


(7) [rrameP [rorcep [opp [rocp [rinp EL BI 


Within this framework, FrameP is the locus of scene-setting locatival and tem- 
poral adverb(ial)s and hanging topics, ForceP is the locus of markers of illocu- 
tionary force and clause-typing, TopP is the locus of topical XPs, FocP is the 
locus of focussed XPs, and FinP expresses the finiteness or non-finiteness of 
the clause. 

However, in a more refined analysis, Frascarelli and Hinterhólzl (2007: esp. 
88) propose that there are, in fact, three different types of topics, each of which 
is projected separately within the left periphery. In this regard, Hinterhólzl and 
Petrova (2010: 320-321) write: 


(a) ABOUTNESS TOPIC: *what the sentence is about" (Reinhart, 1981; Lambrecht, 1994), 
*what is a matter of standing and current interest or concern" (Strawson, 1964); 

(b) CONTRASTIVE TOPIC: an element that induces alternatives which have no impact on the 
focus value and creates oppositional pairs with respect to other topics (Kuno, 1976; 
Büring, 1999); 
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(C) FAMILIAR TOPIC: a given, D-linked constituent, which is typically destressed and realised 
in a pronominal form (Pesetsky, 1987)? 


The hierarchical architecture of the left periphery in (7) is, thus, expanded to: 


(8) [FrameP [rorcep [AbTopP [contrTopP [Focp [FamTopP [యూ tt JN 


Poletto (2002) was the first to propose that the locus of the V2 phenomenon can 
be either FinP + ForceP or FinP alone. In the former, the verb and initial XP 
move through FinP to ForceP, as in (9), thus severely restricting the number of 
constituents that can appear before the verb. 


(9) [FrameP [rorceP XP [Force V] [AbTopP [contrropP [rocp [ramropp [Fine XP [cin y] oe TIN 


In the latter, however, the verb and XP remain in FinP, as in (10), and FrameP, 
ForceP, AbTopP, ContrTopP, FocP, and FamTopP may all host constituents that 
precede the verb. 


(10) [FrameP [rorceP [AbTopP [contrTopP [rocp [FamropP [rinp XP [rin V] tt JN 


We may look to medieval Romance for an illustration. Wolfe (2016, 2018) discusses 
this microvariation and demonstrates that later Old French is a Force V2 language 
and restricts the number of constituents that can precede the verb. In (11), a 
frame-setting clause appears in SpecFrameP (adapted from Wolfe 2018: 69): 


(11) Et  [Framep quant il est apareilliez, [forcep il [Force prent] ses 
and when he bessc.pres appearprcee, he takessc.prrs his 
armes et monte...]] 
weapons and rides; is 
*When he appeared, he took his weapons and rode . . .' (Pauphilet 1923: 
1, 129 [La queste del Saint Graal]) 


In Wolfe's sample, there are but two tokens of V4 out of 632 clauses (0.32 96). 
On the other hand, in Wolfe's sample of 622 clauses in Old Occitan, not only 
does V3 occur more often than in later Old French,'° but V4 occurs in 8.04 96 of 
clauses, as well as V5 in 1.29 96 and V6 in 0.64 96, none of which appear in Old 


9 With regard to the givenness/accessibility characteristic of familiar topics, cf. also Chafe 
(1987). 
10 Old Occitan 29.74%, Old French 24.53%. 
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French. Thus, V5 clauses, such as (12), are possible in Old Occitan, but not in 
later Old French (cited after Wolfe 2018: 68): 


(12 E [per aisso], [illi] [adones], [am gran confusion] comandet 
and for this she therefore at great confusion commandss; pucr 
a totas 
to all 
‘Because of this, amongst great confusion, she commanded everyone to 
... (Albanés 1879: 96 § 41 [La vie de sainte Douceline]) 


It is this ‘relaxed’ instantiation of the V2 phenomenon that we find in Middle 
neo-Brittonic languages. A definitive V6 token is cited by Borsley, Tallerman, 
and Willis (2007: 293) after Poppe (1991: 178), with an analysis in the present 
framework indicated: 


(13) Ac [FrameP o 'r dywed] [Forcep san wuyhaf grym a llafvr] 


and of DFF end with greatest power and toil 

[Abrope gwedy kaffael o "r Brytanyeyt penn e mynyd], 

after get of DEF Britons top DEF mountain 

[erocpen e lle] [gamropp wynt] a lle) dangossassant... 
in DEF place 3PL PTCL place shows; pucr 


*And in the end with the greatest power and toil once the Britons had 
gained the top of the mountain in that place they showed . . .' (Roberts 
1971: lines 795-797 [Brut y brenhinedd]) 


3.5 Preverbal Object DP + pronominal Subject 


George (1991: 216) calls attention to tokens of an Object DP + pronominal 
Subject + affirmative particle a + verb construction and labels it “a valid one" — 
which we interpret to mean that he believes it to be generated by the grammar — 
upon the basis of the fact that there are 29 tokens of it in Beunans Meriasek, 
five of which, he states, *are [not] dependent on the rhyme". He regards it as a 
Cornish innovation (George 1990: 229—230, 239-240). Such constructions are 
found not only in Beunans Meriasek, but in earlier texts, as well, e.g.: 
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(14) a. ag [ol 3e vo3] [hy] a wra 
and all 2SGposs will 3SGrem PTCL GOssc. purs 
‘And she will do all of your will.’ (Toorians 1991: 1. 20 [The Middle 
Cornish Charter]) 
b. ha [henna] sur [my] a greys 
and DIST surely 156 PTCL believe3s¢.pres 
*. . . and I surely believe that.’ (Norris 1859, 1: 1. 1263 [Origo Mundi]) 


We understand the pronominal Subjects in the Middle Breton clause in (6b) 
and the Middle Welsh clause in (13) clearly to be hosted in SpecFamTopP. For a 
clear Middle Cornish example, consider the following passage, in which Teudar 
addresses St Kea: 


(15) Mars o Christ Du marrajak, ‘If Christ was God so gracious, 


pew o e das? who was his father? 

Te pen boba lagajak, You goggle-eyed head of a clown, 
ro gorthyb vas. give a satisfactory answer. 

Bith war! Na fal! Watch out! Don't fail!’ 


(Thomas and Williams 2007: lines 208-212 
[Bewnans Ke]) 


Second person singular deixis having been established, the succeeding line is: 
(16) [Anotho] [te] re gowsys 

Ofssc.masc 256 PERF speak3so.prer 

“You have spoken of him.’ (Thomas and Williams 2007: 1. 213 [Bewnans Ke]) 
There can be little doubt but that the pronominal Subject is hosted by 
SpecFamTopP.'! 
3.6 More on preverbal Object DP + pronominal Subject 
Though there clearly are available Specifier positions to host an Object DP within 


the left periphery while a pronominal Subject is hosted by SpecFamTopP, we are 
not certain that such is the correct analysis, for, as mentioned in section 3.3, a 


11 For a discussion of this construction in Old English and Old High German, see Walkden 
(2015). 
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constraint exists against more than one argument appearing in the left periphery. 
It may be that all tokens of this construction are not generated by the grammar, 
but represent an over-determination of it by the necessity to enable a rhyme 
(here marked by underlining: single and double, dashed and wavy). Consider 
the surrounding context of the two tokens of this construction cited in (14): 


(17) a. lemen y3 torn my as re ‘Now I give her into your hand, 
ha war en grey3 my an te and upon the. . ., I swear it, 
nag vs y far there is not her equal 
an bar3ma 3e Pons tamar from here to the Tamar bridge. 
{ad} my ad pes worty by3 da I beg you, be good to her, 
ag ol 3e vo3 hy a wra H and she will do all your will, 
rag flog yw ha gensy S03, for she is a child, and... 
ha gassy 3e gafus y bo3. And allow her to have her will.' 


(Toorians 1991: lines 15-22 [The Middle 
Cornish Charter])? 


12 We recall Jakobson's (1923: 16 = 1979: 15) “teoriju organizovannogo nasilija poéticeskoj 
formy nad jazykom [theory of organised violence of poetic form over (natural) language]". For 
an example, consider the twelfth stanza of the early Irish poem Fo réir Choluimb céin ad-fías 
[As long as I speak, (may I be) obedient to Columb]: 


Do-ell Erinn, indel cor, ‘He turned away from Ireland, having made covenants (?), 
cechaing noib nemed mbled, he traversed in ships the whales' sanctuary, 
brisis tola, tindis for, he broke desires, he was illuminated (?), 
fairrge al druim dánae fer. A brave man over the ridge of the sea.' (Kelly 1973: 
812) 


In this poem of 4 | 3 heptasyllabic lines, one finds linking alliteration between the last word of 
a line and the first word of the succeeding line, line-internal alliteration, and rhyme. As ob- 
served by Watkins (1995: 121), the words in the final line of this stanza occur virtually in the 
reverse of unmarked order, viz. Fer dánae al druim fairrge, in order to enable the 4 | 3 scansion 
and retain the alliterative patterns: 


fairrge al druim dánae fer 
5620ల over ridgepar braveyom.sc.Masc manNyom 
‘a brave man over the sea’s ridge’ 


13 The text is difficult in places. We provide the reference to the most recent published edition 
and translation, but provide the reading of Bruch (2005: 335) and his unpublished translation 
when either are uncertain. 
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b. me a vyn mos the’n temple ‘I will go to the temple, 
ha dev ena a worthye And will worship God there, 
kepar del goth thy'mmo vy As itis incumbent on me 
ef yv arluth nef ha’n beys He is Lord of heaven and earth, 
ha henna sur my a greys And that I surely believe 
a luen colon pur theffry With full heart, very earnestly.’ 


(Norris 1859, 1: lines 1259-1264) 


It is clear from these passages that the preverbal Object DP + pronominal 
Subject construction exists to enable the rhyme. 


3.6.1 Apparent exceptions 


There are apparent exceptions to the hypothesis that preverbal Object DP + pro- 
nominal Subject constructions are always poetic overdeterminations, but we 
are uncertain whether they ought to be considered as authentic instances of 
Middle Cornish clausal configuration. We provide an example. Consider the 
following stanza: 


(18) Arluth henna ny ara ‘Lord, that we will do. 
desempys duen alema Straightway let us go hence. 
aspyans pup ay quartron Let every one spy from his corner. 
me agis gyd rum ena I will guide you, by my soul, 
pur uskis bys in cambron Very quickly, as far as Camborne.’ 


(BMer. lines 978-982) 
The first verbal clause of this stanza is glossed as: 


(19) [henna] [ny] a ra 
DIST 1PL PTCL dOasc. pas 
‘We will do that.’ (BMer. 1. 978) 


Clearly the configuration of this line, as can be seen in (18), enables the rhyme, 
but one may note that the configuration *ny a ra henna would also enable the 
rhyme, while yielding a configuration with only one argument before the verb, 


14 See the appendix for a discussion of the five tokens that George states are not the result of 
poetic overdetermination. 


324 —- Joseph F. Eska and Benjamin Bruch 


hence the position of henna must be serving a pragmatic purpose and is not 
merely the result of poetic overdetermination. Indeed, this is correct, but there 
is more to be said. Consider the immediately preceding stanza addressed by 
Teudar to the torturers: 


(20) Meryasek ythyv gelwys *Meriasek is he called: 
in crist yma ov cresy In Christ he believes. 
genogh why bethens sesijs By you let him be seized 
gruegh y tormontya besy Do ye torment him. 
crist mar ny veth denehys If Christ be not denied, 
pegh then horsen trewesy A thrust to the doleful whoreson! 
genogh kynfove lethys Though he be slain by you, 
me agis menten defry. I will maintain you certainly.' 


(BMer. lines 970-977) 


In context, then, it is clear that henna in (19) is a response to the directions in the 
passage in (20), and, thus, that it, in fact, has been moved into the left periphery to 
occupy SpecAbTopP. But we must also look at the position of the pronominal 
Subject in this line and, indeed, determine why it is present at all. Were ny not in 
preverbal position, we might expect the clause to appear as *Arluth, henna a ren 
with a conjugated verb, but not only would such a line not rhyme, it would also be 
a syllable short. In Beunans Meriasek, a pronominal Subject, though not required 
after a conjugated verb, could be included in order to make rhyme or syllable- 
count, but *Arluth, henna a ren ny would not provide the necessary rhyme in this 
token. In this clause, it is not that the Object DP has been displaced by poetic over- 
determination, but that the grammatically unnecessary pronominal Subject has 
been inserted in order to enable the correct syllable-count.” 


3.7 Preverbal Subject DP + Object DP constructions 


In this section, we merely observe that, just as the preverbal Object DP + pro- 
nominal Subject construction exists to enable the rhyme and/or syllable-count, 
constructions with both Subject DP and Object DP before the verb, in either 
order, occur for the same reason, e.g.: 


15 Indeed, chevilles are very commonly employed in Middle Cornish verse in order to achieve 
the necessary number of syllables in a line. 
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(21) a. [dew sen] [crist] a 3anvonas 
two man Christ PTCL sendas prey 
3e berna boys ha dewas 
to buya food and drink 
‘Christ sent two men to buy food and drink.’ (Stokes 1860-1861: 16, 
stanza 42° [The Passion]) 


b. [mab marya] [mur a beyn] 
son Mary great of pain 
a wo3evy y n vr na 


PTCL endure4;4,, in DEF hour DIST 
‘The son of Mary endured great pain in that hour.’ (Stokes 1860-1861: 
18, stanza 54? [The Passion]) 


A good illustration of the extent to which displacement can occur in order to 
enable necessary rhyme and syllable-count is the following, the first four lines 
of an ABABABAB stanza: 


(22 [pur wyr] [certan] [an den ma] 
very true certain DEF man PROX 
[lyes den] re wruk treyle 
many man PERF  dos3go.prpp_ యు 
[lagan laha] [ef] yma’® 
1PLposs law — 3SGwasc D€3sc.PREs 
[pup v] ow contradye 


every hour PROG contradictyy 
*Truly, this man certainly has converted many men. He is always oppos- 
ing our law.' (Norris 1859, 1: lines 2423-2426 [Passio Christi]) 


3.8 Some comments on variation in Middle Cornish texts 


The number of clauses that deviate from affirmative root clause V2 in the Middle 
Cornish corpus varies amongst texts. Deviation is fairly common in the poem 


16 Note that yma is one of the few verbs that usually requires V1 configuration even in affir- 
mative root clauses. The occurrence of an adverb(ial) or participle to the left of yma is not un- 
usual, but a DP in that position is very unusual. Note, furthermore, that, in Breton, emañ is 
employed only after an adverb(ial) or participle, while Subject + V + Complement requires a 
20; this construction does not have an equivalent in Cornish, however. 
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Pascon agan Arluth (ca. 1400), but considerably less frequent in the two saints’ 
plays, Beunans Meriasek and Bewnans Ke, which were probably written around a 
century later. The Ordinalia cycle of mystery plays, which are roughly contempora- 
neous in date of composition with Pascon agan Arluth, do not exhibit nearly as 
many tokens of such divergent affirmative root clauses as the poem. This is partic- 
ularly telling when we consider that the central play of the Ordinalia cycle, Passio 
Christi, contains some lines which are also found in Pascon agan Arluth, although 
scholarly opinion is divided as to whether the play or the poem is the older text — 
or original source of the lines in question." It is worth considering, however, that 
the configurational differences amongst the texts may relate to the fact that Pascon 
agan Arluth is a poem, while the Ordinalia, Beunans Meriasek, and Bewnans Ke are 
plays. 

The medieval Cornish dramas seem clearly to have been written as texts to 
be performed aloud by actors. Evidence from historical records and in the manu- 
scripts of the plays themselves suggests that they were actually staged (Joyce 
and Newlyn 1999: 541—558; Bakere 2009: 213). But, as a poem, Pascon agan 
Arluth may have been written as a more purely "literary" text, to be read privately 
or even silently, rather than recited or performed for an audience. It is, therefore, 
possible that the language and number of configurational liberties found in it 
represent a more "literary" register of Cornish, while the plays stay closer to the 
norms of the spoken language, though displaying divergent clausal configura- 
tion when demanded by rhyme and syllable-count. 

Another factor is the type of verse form that Pascon agan Arluth employs. 
Whereas the other works of Middle Cornish verse are written in a wide variety of 
stanza forms, most of which require only three or four pairs of rhyming lines — 
such as AABCCB or ABABCDDC, in which there are two A lines, two B lines, and two C 
(and D) lines — Pascon agan Arluth is written almost exclusively in eight-line stan- 
zas rhymed ABABABAB, in which the poet is obliged to supply two sets of four rhym- 
ing words, i.e. four A rhymes and four B rhymes.? It may be that these much 
stricter rhyming requirements are what spurred the author of Pascon agan Arluth — 
who, unlike the Middle Welsh or Irish bards, was probably not a professional or 


17 Nance (1949: 368), Murdoch (1981: 823-826), Williams (2006: 66), and George (2010: 493) 
believe that Pascon agan Arluth is the older text. Fowler (1961: 104-111) takes the opposite 
view. Bruch's current opinion tends to side with Fowler upon the basis of metrical evidence. 
18 Bruch (2009: 90-91) remarks upon the difficulty of finding the necessary rhyming words for 
an ABABABAB stanza in Middle Cornish, and suggests that this may be a reason why this rhyming 
pattern is employed less over time, occurring less commonly in Beunans Meriasek than in the 
Ordinalia, and not being present at all in the surviving text of Bewnans Ke. 
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highly trained poet - to scramble the constituent order of his affirmative root 
clauses quite freely simply to get an appropriate syllable at the end of each line. 
It seems as though some of the most divergent affirmative root clause configura- 
tions in the Ordinalia also occur in stanzas with ABABABAB rhyming stanzas such as 
in (22), suggesting that it was the need to provide a particular end rhyme, more 
than any other consideration, that prompted poets to depart so widely from the V2 
norm that likely was typical of spoken Middle Cornish. 


4 Late Cornish 


General opinion is that affirmative root clauses in Late Cornish, presumably 
under the influence of English, had become SVO (e.g. Jenner 1904: 158; Lyon 
and Pengilly 1987; Wmffre 1998: 62-63; Brown 2001: 248-249; Gendall 2004: 
98-100, 140; Williams 2011: 336). This may be, but the textual sources are often 
not of high trustworthiness. 


4.1 The Tregear homilies: A transitional text 


Williams (2011: 336) states that the Tregear homilies, dated to ca. 1558, are charac- 
terised by some Middle Cornish features and some Late Cornish features.? He 
states that they are written “in fairly colloquial prose” (Williams 2011: 336); indeed, 
in his judgement, Tregear’s “morphology and syntax are perfect” (Williams 2011: 
338). However, a close examination of the text, a translation of Edmund Bonner's 
A profitable and necessary doctrine, with certayne homelyes adioyned therevnto 
(1555), indicates that Tregear’s translation closely follows the clausal configuration 
of his English exemplar, violating a variety of features of Cornish syntax. We note, 
furthermore, that Tregear deletes, enlarges, or paraphrases portions of Bonner’s 
text. See the appendix for an illustration of his translation practice, several points 
concerning which we discuss in the following subsection. 


19 For example, Tregear’s language generally evinces the Late Cornish unrounding of /ce/ so 
as to merge with /e/ and shows frequent use of periphrastic constructions in preference to con- 
jugated verbs, but does not provide any trace of the Late Cornish pre-occlusion of nasals and 
unrounding of /y/ to /i/. 
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4.1.1 Comments upon John Tregear’s translation practice 


The Tregear homilies appear likely to be a poor, overly hasty, and unpolished 
translation of Bonner, in which Tregear, though a native speaker of Cornish, tends 
to preserve the configuration of constituents in his English exemplar whenever 
possible (perhaps because he was translating one phrase at a time). It is clear that 
he was sufficiently influenced by English lexis and syntax to the extent that he in- 
troduces English words and configurational syntax, e.g. Adj + N relative ordering, 
not only in untranslated borrowings such as sufficiant cawse ‘sufficient cause’ in 
folio 1 1. 5, but also in his own additions to the text, such as perfect colonow ‘per- 
fect hearts’ in folio 1‘, 1. 3, in which the noun is Cornish. This may be evidence 
that, for educated bilinguals such as Tregear, there was a kind of “priests’ 
Cornish” equivalent to the Breton brezhoneg beleg ‘priests’ Breton’, which incorpo- 
rated elements of French lexis and syntax (see Williams 2006: 189). 

Though Tregear gives the impression that he wishes to stay faithful to 
English configurational syntax wherever possible, there are instances in which 
his knowledge of Cornish sometimes encourages him to make a different 
choice. Examples of this include tokens in which he substitutes Cornish V1 for 
English SVO in negative root clauses and embedded clauses, e.g.: 


(23) a. ny rug eff leverall in  pegh mas i 
NEG = dO3go.prer — 3SGwasc Sayyn in sin but in 
"n plural number, in pehosow. 
DEF plural number in sins 
*He saieth not sinne, but in the plural number, sinnes.' (folio 8", 1.14) 
b. lymmyn pan rug du... creatya ha  gull den 


now when dozsc.prer god create, and make,, man 
‘Now when god had... cre[a]ted man. . .' (folio 2", lines 18-19) 


There are also clausal tokens in which he breaks up long English sentences or 
reorders the constituents of those sentences so as to produce coherent Cornish 
sentences that are closer to the syntax of the traditional texts (including, per- 
haps, a preference for V2 configuration in affirmative root clauses). Tregear is 
not reluctant to place two DPs before the verb in a root clause, at least in in- 
stances in which his English exemplar does, so ‘[onely one kynde of fruyte] [he] 
charged hym . . . ' becomes [Saw vn kynda a frut] [an tas dew] a chargias mab- 
den... ‘only one kind of fruit, God the Father charged mankind’ in folio 4’, 
lines 2-3. Since two DPs can occur before the verb in Middle Cornish verse 
texts, perhaps such phrasing sounded acceptable to Tregear as a Cornish 
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speaker. On the other hand, Tregear renders the following clause from folio 4, 
1. 3, in which he wishes to construct an embedded negative clause meaning 
‘that he might not meddle with or touch it [i.e. the fruit],’ as na rella myllya na 
tuchia worta with V1 (Aux [S] V PP) configuration, not S Aux V PP as it would 
be in English.”° 

Tregear is perhaps most useful as a source of negative evidence concerning 
the syntax of Cornish, since there appear to be clauses in which he chooses (or 
is forced) to avoid duplicating the English clausal configuration, even though 
the latter seems to be his default approach to translating Bonner's prose. Like 
Breton clergy, he seems quite comfortable with producing overdetermined DPs 
such as an frut an wethan for Bonner's 'the fruyte of the tree,' for which *frut an 
wethan is expected. But when faced with ‘The Prophette, Dauid . . . alledgeth,' 
Tregear produces progressive yma an profet dauid ow allegia 'The Prophet 
David is alleging,' since this is how he usually expresses the simple present 
tense in Cornish, employing yma + verbal noun and the V1 configuration nor- 
mal for this verb (Williams 2016: 120-124) or the progressive particle ow + ver- 
bal noun.” 

In our view, the Tregear homilies cannot be employed to establish much 
about Middle or Late Cornish affirmative root clause configuration. 


4.2 The writings of the Boson family 


Three members of the Boson family left Late Cornish texts from ca 1660 to ca 
1730. They were not native speakers of the language, however, so one must be 
suspicious of English influence in the texts that they produced. In the folktale 
Dzhüan Tshei an Her ‘John of Chyannor’, said to have been written by Nicholas 
Boson,? one does not find diagnostic V2 structures such as Object DP/Adverb- 
(ial)P + affirmative particle + verb + Subject DP, but there are a number of to- 
kens in which an Object DP and a pronominal Subject precede the affirmative 
particle a and verb, e.g.:7 


20 Of course, the English clause that he is translating, vtterly to refrayne from [eating the 
fruit], is not itself negative or V1 or even an embedded clause, as it lacks a finite verb (we note 
that Cornish has no equivalent to such a negated infinitive). 

21 Unlike Breton, in which it is permissible to formulate an SVO sentence of the type D. a zo + 
verbal noun, Cornish does not have a means of constructing an SVO root clause employing 
Subject DP + a yw + verbal noun; cf. n. 16. 

22 Printed by Lhuyd (1707: 251—253) in an idiosyncratic orthography. 

23 We preserve Lhuyd's orthography in these tokens. 
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(24) a. [Kibmiaz tég] [ev] a kymeraz... 


leave fair 3SG PTCL takess prey 
‘He took fair leave. . . (Padel 1975: 15 83) 

b. Ha [an mona] [andzhei] a gavaz; ha [n 
and DEF money 3PL PTCL finds; ee and DEF 
bara] [dzhei] a dhabraz 


bread 3PL PTCL eat3go.prer 
‘And they found the money; and they ate the bread.’ (Padel 1975: 19 
§ 46) 


It seems to us that clauses such as these may well be the result of L1 interfer- 
ence from English, but they do not provide secure evidence that, as in English, 
the clausal configuration was SVO with the possibility of fronting another con- 
stituent for topicalisation, not least because of the presence of the affirmative 
root clause particle. 


4.3 The Bible translations of Wella Rowe 


The Bible translations of Wella Rowe,” which date from ca 1690, are thought to 
represent some of the latest surviving works written or translated by a native 
speaker of Cornish. As translations, one must be cautious about his Cornish repli- 
cating the configuration of his English exemplar. One passage from Genesis 3:14, 
however, may suggest that Cornish was moving towards SVO: 


(25) War tha doer chee ra moaze, ha  douste chee ra 
upon 2SGposs belly 256 doOzsc.pres gOvw and dust  2sG doo pus 
debre oll deethyow tha vownyas 
eat, all days 2SGposs life 
‘Vpon thy belly shalt thou goe, and dust shalt thou eate, all the dayes of 
thy life.' (Cornish: Loth 1902: 180; trans. KJV Genesis 3:14) 


This passage contains a token of OSV configuration in the Cornish text, douste 
chee ra debre, which probably is intended to parallel the configuration of the 
English exemplar with topicalisation, dust shalt thou eat. Interestingly, the 
English text shows two tokens of V2 configuration, vpon thy belly shalt thou goe 
and dust shalt thou eate, in which shalt is in second position following an initial 


24 Comprised of Genesis 3, the ten commandments, and Matthew 2 and 4. 
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PP or DP, respectively; in both tokens, however, Rowe’s Cornish translation has 
altered the configuration to SVO: War tha doer chee ra moaze (PP S Aux V) and 
douste chee ra debre (0 S Aux V). This may well comprise the best - though 
hardly conclusive - evidence we have for Late Cornish moving towards becom- 
ing SVO in affirmative root clauses.” 


4.4 William Bodinar's letter to Daines Barrington 


William Bodinar's letter to Daines Barrington, written 3 July 1776, is often cred- 
ited as the last text in the traditional Cornish corpus. It is worth noting, how- 
ever, that Bodinar is not considered to have been a native speaker of Cornish 
because he, as described in the letter, learnt the language as a boy from older 
fishermen during expeditions out to sea:*° 


(26) me rig deskey Cornoack termen me vee mawe 
156 doOssc.prer learny Cornish time 156 beac par boy 
‘I learnt Cornish when I was a boy.’ (Pool and Padel 1975-1976: 234.7) 


Bodinar employs S Aux V O configuration in me rig desky Cornoack ‘I did learn 
(= learnt)" Cornish’, which is consistent with both English SVO and Middle 
Cornish V2, but note that he employs SVO configuration in the embedded clause 
termen me vee mawe - though this is, perhaps, owing to that fact that he is a 
native speaker of English, and so may not be diagnostic.’ In his discussion of 
this clause in Pool and Padel (1975-1976: 236), Padel cites a comparable token of 
termen employed as a complementiser in a letter written by John Boson in 1710 in 
which the configuration is also SV in the embedded clause — but, of course, 
Boson also was not a native speaker of Cornish. 


25 The anonymous reviewer asks whether we think that in prospective SVO Late Cornish, as 
in English, the Subject occupies SpecTP and there is no VOT movement. Unfortunately, upon 
the basis of such slight information, such a determination cannot be made. 

26 We note, however, that Pool and Padel (1975-1976: 235) comment that Bodinar's *Cornish 
is authentic - better than that of John Boson some sixty years earlier." 

27 Aux V in place of a conjugated verb is typical of Tregear and Late Cornish. 

28 We observe that Bodinar's letter contains 12 lines of Cornish, none of which include the 
affirmative particle a, which could suggest that it had become phonologically null in the vari- 
ety that Bodinar acquired, leading him to have constructed an SVO grammar. Such a small 
sample is hardly probative, however. 
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4.5 Concluding remarks about Late Cornish 


The analysis of Late Cornish syntax is inherently problematic. Aside from the 
Tregear homilies, a text in transition from Middle to Late Cornish, which, in our 
view, shows considerable interference from English, the corpus is very small 
and composed of translations by a native speaker and texts by non-native 
speakers. As mentioned in section 4.3, Wella Rowe's use of XP S Aux V in place 
of the XP V S O attested in two instances in his English exemplar appears to be 
legitimate evidence in favour of Cornish moving towards SVO in affirmative root 
clauses, but such crumbs are little upon which to hang a definitive judgement. 


5 Future work 


The preliminary remarks presented herein have sketched the broad outlines of 
the diachrony of the configuration of the affirmative root clause in Cornish. 
Clearly, much more work remains to be done. The next step for us, now under- 
way, is to create parsed corpora for Passio Christi, the central play of the 
Ordinalia (ca 1400), and Beunans Meriasek (ca 1504) to make hard data available 
for Middle Cornish. The prospect for progress on Late Cornish — barring the dis- 
covery of more texts — appears doubtful. An edition of the Tregear homilies is re- 
quired before its syntactic structures can be systematically investigated. 


Appendix | Remarks on George's “valid” tokens 
of preverbal Object DP + pronominal 
Subject constructions 


George (1991: 216) notes that the preverbal Object DP + pronominal Subject con- 
struction in the clauses at BMer. lines 1807-1808, 1888-1889, 3224-3225, 
4340-4342, and 4515-5156 are not required to enable the rhyme. The Subject 
and Verb in each token occur in the first half of a line and, thus, do not partici- 
pate in the end-rhyme pattern of the stanza. In each token, it is possible to 
posit a grammatical line that would preserve V2, as well as the syllable-count 
of the phrase. 
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- The five tokens 
The relevant lines follow: 


(27) a. ha thyso age hanov 
and too; 3PLposs name 
me a leuer pur ylyn 
156 PTCL say3sc.pres very fair 
*And to thee their names 
I will tell very fairly.' (BMer. lines 1807-1808) 


b.ha the borse mes a ’th ascra 
and 25603 purse out Of 256099 bosom 
me a "m beth ha th margh uskis 


156 PTCL 1SGmrx haVessc.pres and 2SGposs horse swift 

*And thy purse out of thy bosom 

I will have, and thy swift horse.' (BMer. line 1888) 
c.v lon  bowyn dufunys 

five steer beef mincCepsr.pTCPI. 

y a depse in ij deth 

3PL PTCL eatas;couwp in two day 

*Five beef steers minced 

They would eat in two days.' (BMer. lines 3224—3225) 
d. the volnogeth 

2SGposs Will 

par del deleth 

even as be.fitting3sc pres 


ny a ra snell 
1PL PTCL dossc.pres quickly 
‘Thy will 


*Even as is meet 
We will do swiftly.’ (BMer. lines 4340—4342) 


e. dadder the lues huny 
good to many one 
eff a ruk 3e ihesu gras 


3SGmasc PTCL dossc.prer to Jesus thanks 
*Goodness to many a one 
He did, to Jesus thanks.' (BMer. lines 4515-4516) 
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- Commentary 
All five tokens can be recomposed as V2 clauses that maintain the correct sylla- 
ble count with the Object DP in the left periphery by employing a conjugated 
verb, as in (28a, c-d), and/or placing the pronominal subject in post-verbal po- 
sition, as in (28b- e): 


(28) a. BMer. line 1808: mealeuer > alauaraf 


b. BMer. line 1889: me a'm beth > a'm beth vy 
c. BMer. line 3225: ya depse 2 athepsens y 
d. BMer.line 4342: nyara 2 arenny 
e. BMer.line 4516: effaruk 2 aruk eff 


In (28b-e), the V2 alternative requires the use of a postposed pronominal Subject 
in order to maintain the necessary syllable-count.” We note that George (1991: 
228—229) discusses clauses with an Object DP in the left periphery, but the tokens 
that he cites are a negative clause and a wh-question and thus not relevant. It 
may be that, while the preverbal Object DP + pronominal Subject construction in 
earlier Middle Cornish texts existed to enable rhyme and syllable-count, as in 
(14) and (17), the author of Beunans Meriasek, writing roughly a century later, 
simply preferred not to employ V2 structures with only the Object DP in the left 
periphery. Clearly, further research making use of multiple texts is required, 
which we intend to take up in the future. 

We note that me a leuer in (28a) is a separate case, since a lauaraf with con- 
jugated verb provides the necessary syllable-count itself, i.e. without an overt 
post-posed pronominal Subject. We call attention to the fact, however, that me 
a leuer occurs earlier in the stanza, as illustrated in (29), so perhaps the author 
chose to employ the same phrasing, as parallelism is a well-known feature of 
literary language (Fabb 1997: 137-164 et passim). 


(29) Nyns o an rena dewov ‘Those were not gods, 
me a leuer costentyn I say, Constantine. 
ij abostel caradov Two beloved apostles 
y o 3e crist cuff colyn They were to Christ the dear heart. 
Myr age ymach heb wov Behold their images without a lie 
mars yns y havel certyn Whether they are like them certainly, 
ha thyso age hanov And to thee their names 
me a leuer pur ylyn I will tell very fairly.’ 


(BMer. lines 1801-1808) 


29 In (28b), the post posed pronominal vy in fact cross-references the object agreement affix. 
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Appendix II Illustration of John Tregear's 
translation practice 


Grey highlighted text - Bonner's English not translated by Tregear 

Bold text - Bonner's English altered or replaced with an equivalent phrase by 
Tregear 

SMALL CAPITAL text = additional words or phrases added by Tregear 

Single underscore - English word that had already been borrowed directly into 
Cornish or that was left untranslated by Tregear 

Double underscore - English word (or word previously borrowed from English 
into Cornish) added by Tregear in his Cornish translation 


- Tregear Homilies, folio 1", lines 1-5: 

The Prophette, Dauid in his fore score and nintenth 
IMA an  profet dauit i "n peswar vgans ha | nownsag? 
be3sc.pres DEF prophet David in DEF four score and 19 


psalme, exhorting all people to synge prayse 
psalme, |? ow — exortya oll AN bobyll the ry prayse HAG HONOR 
psalm PROG exhorting,,* all DEF people to  givey, praise and honor 


to almighty god, and to serue him in gladnes, and 

the du P HA th "y servya in lowendar, ha 

to God tO 3SGyasc.poss SeIVeyy in gladness and 
reioyse in his sight, 

GANS PERFECT COLONOW THE reiosya |“ in sight AGAN CREATOR HA 

with perfect? hearts to rejoice in sight 1PLposs creator and 


30 Note that Tregear here employs the cardinal, rather than the ordinal, numeral. 

31 Tregear has converted this phrase, which uses a present participle in English, to a Cornish clause 
with a progressive construction (i.e. a periphrastic present as general present tense; cf. colloquial 
Modern Welsh). He has also broken what is one sentence in Bonner's text into two sentences 
in Cornish (each using the periphrastic present tense), with the break coming in line 4. 

32 Note the use of the preposed adjective, as per English. This is not unusual when Tregear em- 
ploys English adjectives to modify a noun, even when the noun itself is translated into Cornish. 
Note also that the adj. perfect does not appear in Bonner's English text. It is an addition by 
Tregear, who seems to enjoy embellishing and expanding upon Bonner's text as he translates. A 
comparable example of an untranslated English Adj + N phrase is sufficiant cawse in line 5. 
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alledgeth thys 
REDEMAR. yma AN PROFET DAUID ow P allegia helma 
redeemer bÞbessc.prrs DEF prophet David PROG allegey, PROX 


as a sufficient cause thereof. 
kepar (ha) delP^ EW-A sufficiant cawse AGAN REDEMPCION 
as (as) (as) bessc.pres-3SGmasc SUfficient cause 1PLposs redemption 


- Tregear Homilies, folio 4, lines 2-6: 

onely? one kynde of fruyte he charged 
Saw vn kynda a frut | AN TAS DU Pa chargias 
only 1 kind of fruit DEF Father God PTCL chargessc puer 


hym vtterlye to refrayne from, on 
MABDEN NA RELLA MYLLYA | NA TUCHIA WORTA war 
mankind NEGsus dOssG;wppsug; meddley, nor touchyy atssc.masc upon 
payne of death, (and that not of the body 
|“ bayne MERWALL a vernans henn 0 MERNANS an corfe 
pain dyingy, of death DIST beasc.wer death DEF body 
alone, but of the soule also) which was the fruyte of 

ha "n ena | inweth henna o a fnt a 


and DEF soul also DIST  be3g¢.;mpp DEF fruit of 


33 Note that the Cornish repeats the preposition the 'to' of this infinitive construction, al- 
though to is omitted in the English exemplar. 

34 Kepar ha (as originally written) is equivalent to English ‘as’ + DP, e.g. ‘as a sufficient 
cause,’ but the usual way to say ‘as a cause’ would be avel + DP, not kepar ha + DP, which 
usually means ‘just like...’ or ‘even as...’ Even more surprisingly, Tregear has altered it 
here to kepar dell, which is equivalent to English ‘as’ + V, which requires him to introduce a 
new verb ewa ‘it is’ and add a subordinate clause. 

35 Or ‘the sufficient cause of our redemption,’ depending upon how we interpret this ambigu- 
ous Cornish phrase. Presumably, Tregear felt it necessary to replace ‘thereof’ with a phrase 
that specifies ‘of our redemption,’ even though this seems to change the meaning of the text 
from ‘cause to sing praise to almighty God’, which is how we understand Bonner’s English 
text. 
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the tree called 
"n  wethan (gylwys 


DEF tree callpsT-PTCPL 


of good, and 
a "N da ha 
of DEF good and 
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tree of knowledge 
wethan a  wothfes 


in scripture the 
in scripture) |° an 


in scripture DEF tree of knowyy 
euyl. 
"N drog. 
DEF evil 
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65—66, 69, 72 

configuration/constituent order 17, 19, 
44-5, 47, 152, 313-32 

contracted verbs 251-2 

convergence 270, 302, 308, 312 

corpora, treebanks and texts 15-26, 33, 
45-7 

— dictionaries 19, 22, 49, 61-5, 70, 79-80, 
83 

— Irish 1-3, 63-4, 67, 78, 85-9, 94-5, 
101-2 

- Late Cornish 327-32 

— Middle Cornish 325-7, 332 

—- Welsh 27-9, 46-7, 272, 278-9, 281-2, 
286-90, 296-7, 311 


deictic particle 130, 136-7 
dialects 188-92 
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figura etymologica 153, 196-7, 220, 248 
finite-state morphology/transducer 21, 
23-4, 66-73, 75-6, 78-83 


information structure 17-8, 20, 29-30, 33, 
36, 41, 44-5, 47 

intensifier 269-78, 280—5, 287, 293-7, 299, 
303-5, 307, 309-11 


left periphery 314-5, 317-9, 321-2, 324, 334 


machine learning 24, 66, 79, 85, 87, 89, 94, 
103-4 

morphological analyser 23, 49-50, 65-8, 
82-3 

morphosyntax, morphosyntactic 
annotation 15, 17, 19, 22-3, 28, 31-3, 
37, 40-1, 47, 54, 65, 154 

mutations 28-9, 32-3, 40, 56, 90, 92, 
145-6, 189, 239, 250 

— lenition 252 

- nasalisation 56, 58, 120, 145—6, 152—3, 
165-7, 179-93, 195-8, 202-3, 205-25, 
227-33, 237, 250, 252, 256-8, 261-3 


natural language processing (NLP), 
toolkit 22, 28, 31, 33, 41, 50, 61, 64, 
66 

NLP pipeline 29—45, 64 

- annotation 15-47, 63-4 

- lemmatisation/lemmatiser 16, 19, 25, 
64-7, 71, 79-82 

- POS tagging/tagger 15-6, 21-40, 44-7, 
64-8, 75, 81-3, 89-90, 101, 104 

- pre-processing 29-32 

- syntactic parsing/parser 15-6, 21-2, 
25-31, 35-7, 41, 43-7, 52, 62, 64, 67-8, 
70, 75-6, 78-9, 332 

— tokenisation 26, 31-2, 47, 64 

nota augens 56-60, 115-6 


orthography 25, 28, 31-3, 40, 64, 71, 82, 92, 
115, 250, 329 
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- standardisation, normalisation 19, 24-5, 
28-30, 33, 47, 50-1, 64-7, 71, 79-82, 
90, 92-3, 104, 189 


paradigmatic split 173-5 

particle theory/‘Cowgill particle’ 239-40, 
246-8, 263-5 

phonological changes 

- depalatalisation 166-9 

- j-apocope 239-40 

- loss of nasals 189, 192, 196, 229 

- pretonic e» a 242 

preverbs 37-8, 52-8, 60, 70-2, 76-8, 80-1, 
143-5, 147-51, 155-71, 174-6, 214, 
220-1, 225-6, 231, 239-8, 261, 263-5, 
308, 310 

pronouns 34-36 

— demonstrative, Old Irish 115—141, 153 

- infixed (general) 56, 86, 90, 119, 246, 261, 
271, 288, 294-5, 309-11 

— infixed, Class A 56, 143-55, 162-7, 169, 
172, 175-6, 216-9, 221, 226-7, 229-31, 
233, 244, 250 

— infixed, Class B 56, 143-67, 169-70, 
172-7, 216-8, 220-2, 229, 233, 244, 
250 

- infixed, Class C 56, 143-7, 149, 151-67, 
169-76, 218-22, 227, 230-1, 233, 250 

- prepositions with pronominal 
objects 34-35, 38, 90, 121, 127-8, 134, 
166-8, 179—180, 182-3, 187, 191, 243, 
281, 283-5 


reflexive pronouns/markers, 
reflexivity 269—78, 281-2, 286-312 
relative clauses 58, 143-7, 151-3, 162-3, 
165-7, 169-72, 175, 195-237, 248-262 
- leniting relative clauses 166, 169—70, 196, 
201, 218-23, 225, 232-3, 237, 250, 252, 
254, 256, 258-61, 263 
- nasalising relative clauses 58, 152-3, 
195-233, 237, 249, 252, 256-8, 262-3 
relative marker 76, 78, 174, 230 


statistical methods 22, 24, 65-6, 85-6, 
89-93, 99-100 

- Fisher's exact test 241 

—F-scores 65 

— k-medoids 91-104 

- Principal Component Analysis 93-4 

— ten-fold cross-validation 39 

syntactic raising 206, 223-4, 228, 231, 233, 
237 

syntax, syntactic annotation 16-21, 23-4, 
29—31, 36, 41-3, 45-7, 64, 143, 146, 
152-6, 160-1, 171-2, 176, 179-80, 
183-6, 293-7, 308, 313-32 


verbal complex 52-60, 70-1, 75, 143, 
145-55, 166, 168-9, 171, 176, 223, 
243-5, 248 


XML, TEI XML 20, 30, 43, 46-7, 63, 86, 88-9 
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Old and Middle Irish -s (infixed pronoun as relative marker) 230 
a" ‘when’ 228-9 samlaid 258, 263 

ad- 147, 151, 160, 220 56, sí (personal pronoun) 137-41 
aisndís, aisndisse 186 sin, sen, sein, sain 115, 117-24, 131-4, 
aith- 163-4 136-7, 141 

améin/amné 258, 263 -sin 115-6 

ar-, ara- 168, 226, 233, 244 síu 128, 130, 136 

arindí 257 SÓ, Sé, séo (demonstrative pronoun) 115, 117, 
as-, asa: 147, 244, 255 124-31, 134-5, 138-40 

as-beir, -epir 73-4 -so, -Se 115-7 

as-oilgi 70 sommae 236-7 

at-td 234-7 sund 129 

beirid 75 tüailnge, tüailngigidir 186 

bete 237 

ceta-, cita-, ciata- ‘first? 239-48, 263-5 Welsh 

ceta- ‘along’ 241, 246-8, 264-5 efo 280-1 

cian 205-6, 225 hun(an) 269-70, 276-312 

co" 215 sef 30 

con- 151 ym- 270-1, 274-7, 286-93, 296-7, 300-2, 
conid- 174 307, 310-1 

cruth, in chruth 228-9, 249 ynteu 280 

-d'- 152 

de/i- 150 Breton 

di-sruthaigedar, disruthigthe 255-6 em- 308 

do- 150 en, em 308 

dó 252 

do-esta 169 Latin 

do-léici, -teilci, -tarlaic 71-3, 76-7 deus 173-4 

dominge 236 diuus 173-4 

etar, itar 242-4 ipse 305-7 

fo-fera 169-71, 173, 176 se 274, 291, 303-4, 306-7 

forngaire 186 te 304-5 


frecndairc, frecndarcus 186 
fris- 160, 174-5, 221 
frit-racatar 172 

i 130, 136-7 

imm-, imma- 244 
immaircide 258-9 

in(d)-, inda- (preverb) 244 
in(d) (adverbial particle) 214-5 
léicid 68, 73-4, 76-7 

OCU- 246-8, 264 

ol sé, ol sí 137 


Proto-Celtic 

*ambi 276 

*enter 242-3 

*eti 240, 246, 263 
*kanta 241, 247, 264 
*kentu- 240, 242-8, 263 
*onku- 246, 248, 264 


Proto-Indo-European 
*s(u)e- 269-70, 274 
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Germanic 
hine, hem, him (Old and Middle 

English) 271-2, 284 
selbst (German) 269, 273, 276-7, 310-1 
-self (English) 269-74, 276, 284, 310-1 
ser, sik (Old Norse) 270 
sich (German) 269, 273, 275, 277, 292, 311 
sih (Old High German) 270 


sik (Old Saxon) 270 
sik, sis (Gothic) 270 


Romance 

me (Old French) 307 

se (French) 269, 274—5, 308 
se- (Surselvan) 274 

si (Italian) 269, 274 


